YACC

This section will build a simple yacc file utilizing a slightly modified old faithful grammar. The only changes have been to allow multiple lines to be entered between the START and END keywords and the assigning of variables. The original lex file will also have to be updated to work properly with the grammar. This is the grammar we will use:

program -> START statlist END

START END

statlist ->statlist statement

Statement

Statement ->var EQUALS num

E

E -> E + T_

T_

T_ ->T

T ->T * P

P

P ->( E_ )

num

var

E_ ->E

The first step is to identify each of the tokens as we are going to want the lexical analyzer to return unique identifiers (integers) for each one. The list for the above grammar is var, num, START, END, EQUALS, PLUS, MINUS, MULT, DIVIDE, L_PARENTH, R_PARENTH. Notice that single ASCII characters have been converted to keywords as opposed to the actual character (PLUS instead of “+”).

Now that the grammar is known and tokens identified, it is time to build the yacc file. The top of the file starts with the list of tokens (preceded by a ‘%’). Next any c header information that is needed enclosed by %{ … %}. In this case only the standard io library is needed.

%token var, num, START, END, EQUALS, PLUS, MINUS, MULT, DIVIDE, L_PARENTH, R_PARENTH

%{

#include <stdio.h>

%}

This section contains the grammar in a fairly common format. While this was not mentioned explicitly in any of the provided documentation, the grammar will not work if any of the symbols contain a dash (‘-‘). After each reduction and c code contained in curly brackets will be executed. In this case the reduction number is printed out, but it could do just about anything such build/update a symbol table, error checking, or call a subroutine.

%%

program: START statlist END{printf("Reduction 1\n");}

|START END{printf("Reduction 2\n");}

;

statlist:statlist statment {printf("Reduction 3\n");}

|statment{printf("Reduction 4\n");}

;

statment:var EQUALS num{printf("Reduction 5\n");}

|E{printf("Reduction 6\n");}

;

E:E PLUS T_{printf("Reduction 7\n");}

|T_{printf("Reduction 8\n");}

;

T_:T{printf("Reduction 9\n");}

;

T:T MULT P{printf("Reduction 10\n");}

|P{printf("Reduction 11\n");}

;

P:L_PARENTH E_ R_PARENTH{printf("Reduction 12\n");}

|num{printf("Reduction 13\n");}

|var{printf("Reduction 14\n");}

;

E_:E{printf("Reduction 15\n");}

;

%%

This last part contains (2) necessary functions. The first just provides an action when a syntax error is discovered. The main() function simply calls yyparse – the function that does the syntax analysis and call up the lexical scanner whenever it needs a token.

int yyerror(char *s)

{

fprintf(stderr, "***%s error!!\n", s);

return0;

}

int main(void)

{

yyparse();

return0;

}

Now that the grammar file is complete, it needs to be run through yacc to create the c code and a header file. Locate the gram_2.y file in Part2 directory, and type the following:

yacc -d gram_2.y

The header file is important as it will be used by the lex file to recognize and assign unique numbers to tokens. Below is a list of the tokens and assigned id numbers. Notice that all are above 255 – this is to allow actual ASCII characters to be passed instead of creating tokens for each. In this example we will not take advantage of that and will assign tokens for each desired single ASCII character (i.e. – PLUS for ‘+’). This was not actually necessary, and is actually quite a bit more work, but it is easier to follow what is going on.

#define var 257

#define num 258

#define START 259

#define END 260

#define EQUALS 261

#define PLUS 262

#define MULT 263

#define L_PARENTH 264

#define R_PARENTH 265

There are some changes necessary in the original lex file. Basically the main function has to be eliminated, each token has to be uniquely returned, and the header file has to be included.

%{

#include "y.tab.h"

#include <string.h>

char buf[500]; /* buffer for line */

int cur_tok;/* start of next token */

int token_id;

char token_text[64];

int n=1;

int end_reached=0;

%}

Notice in the following code snippet how each character has been broken out individually – this again is not necessary, but makes it very easy to insure the return of a specific token number each time a token is identified. Makes the code easier to read as well. A better, more efficient way to do this would be to recognize any of a set of ASCII characters (like in the part 1 example), and then have some type of switch statement to return the correct token ID.

.

.

.

"("{//found an ascii character

printf("TOKEN - ASCII character ('%s')\n", yytext);

addto_buf();

return L_PARENTH;

}

")"{//found an ascii character

printf("TOKEN - ASCII character ('%s')\n", yytext);

addto_buf();

return R_PARENTH;

}

"=="{//found a multi-ascii character

printf("TOKEN - multi-ASCII character ('%s')\n", yytext);

addto_buf();

return EQUALS;

}

END{//found a keyword

printf("TOKEN - Reserved (%s)\n", yytext);

addto_buf();

return END;

}

START{//found a keyword

printf("TOKEN - Reserved (%s)\n", yytext);

addto_buf();

return START;

}

.

.

.

Now that the lex file is completed, create the parsing source code, generate the executable, and test it by typing the following:

lex lex_2.l

cc lex.yy.c y.tab.c –o parse

./parse < test2.txt