CS3012 (Formal Languages and Compilers) Monday, 15th January, 2001

1. (a)State the five elements of the formal definition of a finite state automaton (FSA), and explain what it means for a string to be accepted by a FSA.

What is the difference between a deterministic FSA and a non-deterministic FSA? Why is it important?

(5)

(b)Draw FSAs for the following languages. The FSAs may be non-deterministic. Explain any abbreviations you use.

(i)strings of characters which contain the substring “exam”

(ii)strings over {0,1} which have any number of pairs of 0s and then any number of pairs of 1s

(5)

(c)A car manufacturing company wants to analyse its production process in the paint shop. A car must be placed on the assembly line before it can be painted. The car must be removed from the line at the end of the whole process. After a car has been taken off the line, it can be placed back on it again in the same orientation, or it can be rotated and placed again. For any given placement on the line, the car can be sprayed arbitrarily many times. Specify the operation of this assembly line by sketching a FSA, where the edge labels are taken from the list of operations [place, remove, grip, paint].

(4)

(d)Convert the following non-deterministic FSA to a deterministic one, using the full conversion algorithm, and show all the steps.

(8)

(e)What role do FSAs play in compiling? Why can’t they be used to recognise complete programs in standard programming languages – e.g. C, Pascal or Java?

(3)

(25)

PLEASE TURN OVER

  1. (a) What is a context-free grammar? What does it mean for a string to be in the language of a grammar?

(5)

(b)What is a derivation tree? Explain in terms of derivation trees what an ambiguous grammar is. Why are ambiguous grammars a problem in compiling?

(5)

(c)What is “parsing”? What role does it play in compilers? Explain the difference between top-down and bottom-up parsing. Classify recursive-descent, LL(1) and LR(1) as either top-down or bottom-up.

(5)

(d)Using the grammar and parse table below, show using the LR(1) algorithm that a*(b+a) is in the language of the grammar. id represents both a and b.

1)E -> E + T

2)E -> T

3)T -> T * F

4)T -> F

5)F -> ( E )

6)F -> id

E / T / F / id / + / * / ( / ) / #
0 / 1 / 2 / 3 / S5 / S4
1 / S6 / A
2 / R2 / S7 / R2 / R2
3 / R4 / R4 / R4 / R4
4 / 8 / 2 / 3 / S5 / S4
5 / R6 / R6 / R6 / R6
6 / 9 / 3 / S5 / S4
7 / 10 / S5 / S4
8 / S6 / S11
9 / R1 / S7 / R1 / R1
10 / R3 / R3 / R3 / R3
11 / R5 / R5 / R5 / R5

(8)

(e)Briefly describe a method of introducing error handling into the LR(1) algorithm

(2)

(25)

PLEASE TURN OVER

3.Consider the following description of a language.

A program is a non-empty sequence of statements. A statement is either an assignment of an expression to a variable, a while statement or a print statement. An assignment uses the “:=” operator and is terminated by a semi-colon. An expression is a sum or product of multiple variables or integers. Expressions are evaluated in a left-to-right order. A while statement consists of the while keyword, a test between brackets, a non-empty sequence of statements, and the “end” keyword. A test is a relational operator between two expressions. "<", ">" and “=” are the only relational operators. A print statement consists of the “print” keyword, an expression between brackets, and a semi-colon. All variables are integer valued, and are not declared. Variable names are non-empty sequence of lower-case letters.

A sample program is given below:

x := 1;

while (x < 10)

y := 1;

sum := 1;

while (y < x)

sum := sum + y;

end

print(y);

x := x+1;

end

(a)Write the regular expression/action part of a Lex script to recognise the appropriate tokens, and return them to a calling program. Issue a call to an error function if something other than a token is found.

(4)

(b)Write the grammar rules section of a Yacc script to recognise programs in this language.

(6)

(c)How could you associate multiple attributes with the non-terminals of your grammar? You don’t need to write any code, but describe how it could be done.

(2)

(d)What is three-address code? What role does it play in compiling?

(3)

(e)Augment your grammar rule(s) for the while statement with functions to generate the corresponding three address code. Use a function “gen” which evaluates its arguments and generates code, and || to concatenate code. Write the pseudo code or draw a schematic diagram to show how the generated three-address code would work.

(5)

(f)Describe a scheme for representing arrays. What is the main challenge for translating array references into three address code?

(5)

(25)

PLEASE TURN OVER

4. (a)What are abstract syntax trees? Why might they be used in compiling?

(4)

(b)Show a suitable abstract syntax tree for the following program fragment. Use as few nodes as possible, without losing any relevant information contained in the program. Use sibling links for elements on the same level. No node should have more than four children.

int simplefunc(int x, real y) {

int k;

x = x+1;

return x;

}

(5)

(c)In the language of part (b), a function is defined by stating a data type, then the function name, then a non-empty comma-separated list of declarations between round brackets, and then a non-empty sequence of statements between curly brackets. Data types are either int or real. A declaration is a data type followed by a single variable name. A statement can be a declaration, or an assignment, or a return statement. All statements must be followed by a semi-colon. An assignment consists of a variable name, “=”, and an expression. An expression is a combination of sums (+), and products (*) of variables, reals, and integers. The product operator has higher precedence than sum. A return statement consists of the return keyword followed by an expression. Note: the language is not C.

Write a grammar for the above language – assume names, ints and reals are terminals

(7)

(d)Show how you would augment your grammar rules with functions to create syntax trees in the style of your answer to (b). Explain any functions you use. You may use either the $ notation, or attributes. If a grammar rule has only one symbol on the right-hand side, and you are simply passing all the attributes from that symbol to the left hand side, you don’t need to write an action.

(9)

(25)

1