CS3012 (Formal Languages and Compilers) 10 August 2005

1.(a)Let T be the alphabet {a,b,c}, and be the empty string.

(i)Write down the first five elements of T* in lexical order and dictionary order.

(ii)Write down the first five elements of T+ in lexical order and dictionary order.

(iii)Draw a finite state diagram A such that L(A)=T+.

(5)

(b)Write regular expressions for

(i)the language of strings over {a,b} such that each string is a sequence of ab pairs

e.g. ababab, abababababab, etc.

(ii)the language of strings over {a,b,c} such that no two cs ever appear side by side.

(4)

(c)Consider the following regular expression: b*ab* + a*ba*ba*

Which of the following strings are in this language?

(i)babbbbbbba

(ii)baba

(iii)aaab

(iv)a

(2)

(d)A certain programming language allows real constants to be written in exponent form - e.g. 3.25E6 (for 3.25x106) or 2.16E-5 (for 2.16x10-5). There must be exactly one digit before the decimal point (preceded by an optional sign), at least one digit after the decimal point, and the exponent can be a positive or negative integer.

Write a regular expression specifying the language of such numbers. Define any abbreviations you use.

(4)

PLEASE TURN OVER

1.(continued)

(e)Convert the following non-deterministic FSA to a deterministic FSA, using the subset construction algorithm. Show all steps. The subset construction algorithm is given below.

Algorithm to remove edge choices (subset construction)

Given a NDFSA, A = (Q,I,F,T,E), create DFSA A' s.t. L(A) = L(A')

begin

I' := {I}/* Note: I is a set; I' is a set with one member, I */

F' := {}

E' := {}

S := {I}

Q' := {I}

while S is not empty dobegin

select X  S

for each t T dobegin

S' := {q Q : (p,t,q)  E, for some p  X}

if S'  {} thenbegin

if S'  F  {} then F' := F'  {S'}

E' := E'  {(X,t,S')}

S := S  {S'}\Q'/* if we haven't seen S' before, add to S */

Q' := Q'  {S'}/* if S' is already in Q', Q' doesn't change */

end

end

S := S\{X}

end

return (Q',I',F',T,E')

end

(10)

2.(a)Write context free grammars for the following languages, and show derivation trees for the given examples.

(i){aibjak: i = j + k}

Example string: aaaaabbaaa

(ii)the language of lists, where each list is a matching pair of square brackets {[,]} containing a possibly empty comma-separated sequence of elements. An element can be either an atom or a list.

Example: [ atom , [ atom , [ ] ] , atom ]

(9)

(b)Show that for any regular expression r, and any string s:
sr*if and only if sr**

(4)

2.(continued)

(c)(i)What is an ambiguous grammar?

(ii)Two grammars defining a simple language of expressions are shown below. Show that one of them is ambiguous.

(a)(b)

1)E -> E + E1)E -> E + T

2)E -> E * E2)E -> T

3)E -> ( E )3)T -> T * F

4)E -> a4)T -> F

5)F -> ( E )

6)F -> a

(iii)Why might ambiguity be a problem in a grammar?

(4)

(d)Parse a * (a + a) using grammar (c)(ii)(b) above and the LR(1) parse table and algorithm below.
Show all steps.

E / T / F / a / + / * / ( / ) / #
0 / 1 / 2 / 3 / S5 / S4
1 / S6 / A
2 / R2 / S7 / R2 / R2
3 / R4 / R4 / R4 / R4
4 / 8 / 2 / 3 / S5 / S4
5 / R6 / R6 / R6 / R6
6 / 9 / 3 / S5 / S4
7 / 10 / S5 / S4
8 / S6 / S11
9 / R1 / S7 / R1 / R1
10 / R3 / R3 / R3 / R3
11 / R5 / R5 / R5 / R5

begin

z := 0

w := input string concatenated with #

loop

q := last symbol in z%top state in stack

t := first symbol in w

if M[q, t] = Snthen begin% row q, column t in table

remove t from front of w

put n on end of z

end

elseif M[q, t] = Rp then begin% p = B -> 

remove || symbols from end of z

q := last symbol in z% top state in stack

put M[q,B] on end of z% new state

end

else if M[q, t] = A thenreturn true% input  L(G)

elsereturn false% input  L(G)

end

end

(8)

PLEASE TURN OVER

3.(a)Convert the FSA below to a regular expression, using the FSA -> RegExp algorithm. Show all steps. The algorithm is provided, but you are not obliged to use it if you do not wish to.

begin

convert A to a RFSA %trivial

while Q\{i,f} is not empty dobegin

for each state p  Q with more than one edge (p,ri,p) (i ≤ n) do

replace all those edges by (p,r1+r2+...+rn,p)

for each pair p,q  Q with more than one edge (p,ri,q) (i ≤ n) do

replace all those edges by (p, r1+r2+...+rn,q)

select s  Q

for each pair p,q  Q (p,q  s) s.t. there are edges (p,r1,s) and (s,r2,q) do

if there is an edge (s,r3,s) then add the edge (p,r1r3*r2,q)

else add the edge (p,r1r2,q)

remove all edges to or from s

remove all states and edges with no path from i

end

return r, where E = {(i,r,f)}

end

(6)

(b)Suppose we have a file containing a C program, and we want to count the number of times the programmer has used strcmp in the program. Assuming we process the file one character at a time, write a Moore Machine (or a Mealy Machine) which will output a "1" every time strcmp appears. (For the purposes of this exercise, simply count every time the string appears, including inside comments or quoted strings, or as a substring of a larger string.) Define any abbreviations you use.

(7)

(c)Write context-free grammars to define the languages below.

(i)All strings over {a,b} that contain two successive bs.

(ii)All strings over {a,b} that start with an a, have a positive number of bs, and finish with an a.

(4)

(d)Prove that the language {anb2cn: n ≥ 0} is not regular.

(8)

PLEASE TURN OVER

4.Consider the following simple language.

  • There are two data types: intand real.
  • A variable consists of a sequence of upper or lower case letters.
  • Variables are declared by stating the data type, then a non-empty comma-separated sequence of variable names, and then a semi-colon.
  • All declarations come before any other statements.
  • All variables must be declared.
  • Statements assign expressions to variables.
  • Expressions are the sum or product of other expressions, or they are other expressions inside parentheses, or they are variables, or they are numerical constants.
  • An integer constant consists of a non-empty string of digits.
  • A real constant consists of two non-empty strings of digits, separated by a single ".".
  • Each assignment statement ends in a semi-colon.
  • Adding or multiplying an integer and a real produces a real.
  • Assigning an integer-valued expression to a real variable is legal; assigning a real-valued expression to an integer variable is illegal, and causes an error.

An example program is given below.

int x, y;

real a, b;

int count;

count := 2;

x := count * y;

a := x + (b*y);

(a)Write the regular expression/action section of a Lex script to create a lexical analyser which will recognise the basic units of the language, and return appropriate tokens. Do not worry about error checking.

(5)

(b)Write a grammar for the language. You may write an ambiguous grammar. Write each grammar rule on a separate line. Do not attempt to do type checking in the grammar yet.

(10)

(c)Augment your grammar to construct a type checker by associating attributes with some of the symbols and associating attribute rules with some of the grammar rules. Your type checker should implement the last sentence of the language description above.

(10)

END OF PAPER

1