CS3012 (Formal Languages and Compilers) Tuesday, 7th August, 2001

1.(a)(i)Define a context-free grammar (and not just a CFG rule).

(ii)Write context-free grammars for the following three languages:

Ipalindromes over the alphabet {x, y, z}, where palindromes read the same forward and backward

II{aibjck: k = i + j, i,j,k  0}

III{aibj: i > j, j  0}

(12)

(b)Using the grammar and parse table below, show using the LR(1) algorithm that (a + b)*a is in the language of the grammar. id represents both a and b.

1)E -> E + T

2)E -> T

3)T -> T * F

4)T -> F

5)F -> ( E )

6)F -> id

E / T / F / id / + / * / ( / ) / #
0 / 1 / 2 / 3 / S5 / S4
1 / S6 / A
2 / R2 / S7 / R2 / R2
3 / R4 / R4 / R4 / R4
4 / 8 / 2 / 3 / S5 / S4
5 / R6 / R6 / R6 / R6
6 / 9 / 3 / S5 / S4
7 / 10 / S5 / S4
8 / S6 / S11
9 / R1 / S7 / R1 / R1
10 / R3 / R3 / R3 / R3
11 / R5 / R5 / R5 / R5

(8)

(c)What are the main criteria for judging error recovery in compiling? How would you introduce error recovery into the parse table of part (b)?

(5)

(25)

PLEASE TURN OVER

2. (a)Let T be an alphabet, and be the empty string.

(i)What are the three basic regular expressions, and what languages (over T) do they represent?

(ii)What are the rules for constructing complex regular expressions out of simpler ones, and what languages do they create from the simpler languages?

(6)

(b)Which of the following strings are in the language defined by the regular expression (b+a*b)(a*b + ba)* ?

(i)aabaabba

(ii)aba

(iii)babab

(iv)bbababb

(2)

(c)Write regular expressions for the languages of all strings over {0,1}:

(i)that do not contain two 1s in a row;

(ii)that begin or end with 00 or 11;

(iii)that contain an odd number of 1s and any number of 0s.

(7)

(d)(i)Explain the relationship between Finite State Automata and Regular Expressions.

(ii)Explain how the Unix utility Lex works in terms of FSAs and Regular Expressions.

(iii)What role does Lex play in compiling?

(10)

(25)

3.Consider the following simple language.

There are two data types: intand real. A variable consists of a sequence of upper or lower case letters. Variables are declared by stating the data type, then a non-empty comma-separated sequence of variable names, and then a semi-colon. All declarations come before any other statements. All variables must be declared. Statements assign expressions to variables. Expressions are the sum or product of other expressions, or they are other expressions inside parentheses, or they are variables, or they are numerical constants. An integer constant consists of a non-empty string of digits. A real constant consists of two non-empty strings of digits, separated by a single ".". Each assignment statement ends in a semi-colon. Adding or multiplying an integer and a real produces a real. Assigning an integer-valued expression to a real variable is legal; assigning a real-valued expression to an integer variable is illegal, and causes an error.

An example program is given below.

int x, y;

real a, b;

int count;

count := 2;

x := count * y;

a := x + (b*y);

(a)Write the regular expression/action section of a Lex script to create a lexical analyser which will recognise the basic units of the language, and return appropriate tokens. Do not worry about error checking.

(5)

(b)Write a grammar for the language. You may write an ambiguous grammar. Write each grammar rule on a separate line. Do not attempt to do type checking in the grammar.

(10)

(c)What is the difference between an inherited and a synthesised attribute in an attribute grammar? What is the significance of the difference for compiling?

(3)

(d)Augment your grammar to construct a type checker by associating attributes with some of the symbols and associating attribute rules with some of the grammar rules. Your type checker should implement the last sentence of the language description above.

(7)

(25)

PLEASE TURN OVER

4.(a)(i)What is a symbol table?

(ii)Outline how a symbol table might be constructed during the compilation process. Explain the consequences for symbol table construction of compiling a language with a single global scope and one with nested scopes.

(6)

(b)What is an abstract syntax tree? Why might abstract syntax trees be used in compiling?

(4)

(c)Show a suitable abstract syntax tree for the following program fragment. Use as few nodes as possible, without losing any relevant information contained in the program. Use sibling links for elements on the same level. No node should have more than four children.

real half(int x) {

real k;

k = (x+1)* 2;

return k;

}

(5)

(d)A grammar is shown below for the language of part (c). Show how you would augment the grammar rules with functions to create syntax trees in the style of your answer to (c). Explain any functions you use. You may use either the $ notation, or attributes. If a grammar rule has only one symbol on the right-hand side, and you are simply passing all the attributes from that symbol to the left hand side, you do not need to write an action.

1) Func -> Type name ( Params ) { Stats }

2) Type -> int

3) Type -> real

4) Params -> Decl , Params

5) Params -> Decl

6) Decl -> Type name

7) Stats -> Stat Stats

8) Stats -> Stat

9) Stat -> Decl ;

10) Stat -> Assig ;

11) Stat -> Return ;

12) Assig -> name = Expr

13) Expr -> Expr + Term

14) Expr -> Term

15) Term -> Term * Factor

16) Term -> Factor

17) Factor -> name

18) Factor -> int

19) Factor -> real

20) Factor -> ( Expr )

21) Return -> return Expr

(10)

(25)

END OF EXAMINATION PAPER

1