AUTOMATA & COMPILER DESIGN [IT-713]
Sem. VII
Examination, Dec.2014
Unit I
1.a)Explain the equivalence of NFA and DFA with suitable example.
Ans. For every non-deterministic finite automata, there exists an equivalent deterministic finite automata. The equivalence between the two is defined in terms of language acceptance. Since an NFA is a nothing more than a finite automata in which zero, one, or more transitions on an input symbol is permitted, we can always construct a finite automata that will simulate all the moves of the NFA on a particular input symbol in parallel. We then get a finite automata in which there will be exactly one transition on an input symbol; hence, it will be a DFA equivalent to the NFA.
Since the DFA equivalent of the NFA parallels the moves of the NFA, every state of a DFA will be a combination of one or more states of the NFA. Hence, every state of a DFA will be represented by some subset of the set of states of the NFA; and therefore, the transformation from NFA to DFA is normally called the "construction" subset. Therefore, if a given NFA has n states, then the equivalent DFA will have 2n number of states, with the initial state corresponding to the subset {q0}. Therefore, the transformation from NFA to DFA involves finding all possible subsets of the set states of the NFA, considering each subset to be a state of a DFA, and then finding the transition from it on every input symbol. But all the states of a DFA obtained in this way might not be reachable from the initial state; and if a state is not reachable from the initial state on any possible input sequence, then such a state does not play role in deciding what language is accepted by the DFA. Hence, the amount of work involved in transforming an NFA to a DFA can be reduced if we attempt to generate only reachable states of a DFA. This can be done by proceeding as follows:
Let M = (Q, S, d, q0, F) be an NFA to be transformed into a DFA.
Let Q1 be the set states of equivalent DFA.
begin:
Q1old = F
Q1new = {q0}
While (Q1old Q1new)
{
Temp = Q1new - Q1old
Q1 = Q1new
for every subset P in Temp do
for every a in Sdo
If transition from P on a goes to new subset S of Q
then
(transition from P on a is obtained by finding out
the transitions from every member of P on a in a given
NFA
and then taking the union of all such transitions)
Q1 new = Q1 new È S
}
Q1 = Q1new
end
A subset P in Ql will be a final state of the DFA if P contains at least one member of F of the NFA. For example,consider the following finite automata:
M= ({q0,q1,q2,q3},{0,1},δ, q0,{ q3})
where:
δ(q0,0) = { q1} δ(q0,1) = Φ
δ(q1,0) = { q1} δ(q0,1) = {q1,q2}
δ(q0,0) = Φ δ(q0,1) = { q3}
δ(q0,0) = { q3} δ(q0,1) = { q3}
The DFA equivalent of this NFA can be obtained as follows:
0 / 1{q0) / {q1} /
{q1} / {q1} / {q1, q2}
{q1, q2} / {q1} / {q1, q2, q3}
*{q1, q2, q3} / {q1, q3} / {q1, q2, q3}
*{q1, q3} / {q1, q3} / {q1, q2, q3}
/ /
The transition diagram associated with this DFA is shown in Fig.
Figure. Transition diagram for M = ({q0, q1, q2, q3}, {0, 1} d, q0, {q3}).
(b)Give the minimized DFA for the following expression (a/a)*abb.
2. a) Explain Arden’s theorem?
Ans.2 a) Arden’s theorem: for every regular expression there exist a deterministic finite automation. So we can say that regular languages, regular expressions and finite automata are all different representation of the same thing. To convert a regular expression into a finite automation; first know about Arden’s Theorem.
Statement: Let P and Q be two Regular Expression s over Σ. If P does not contain Λ, then for the equation R = Q + RP has a unique (one and only one) solution R = QP*.
Proof: Now point out the statements in Arden's Theorem in General form.
(i) P and Q are two Regular Expressions.
(ii) P does not contain Λ symbol.
(iii) R = Q + RP has a solution, i.e. R = QP*
(iv) This solution is the one and only one solution of the equation.
If R = QP* is a solution of the equation R = Q + RP then by putting the value of R in the equation we shall get the value ‘0’.
R=Q+RP
R-Q-RP=0
LHS R-Q-RP
(Putting the value of R in the LHS we get)
QP*-Q-Q-P*P
=QP*-Q(^ +*PP)
=QP*-QP* =0
So from here it is proved that R = QP* is a solution of the equation R = Q + RP.
2.b) What is regular expression? State the rules, which define regular expression?
Ans.2 b) A regular expression is a notation to specify a regular set. Hence, for every regular expression, there exists finite automata that accepts the language specified by the regular expression. Similarly, for every finite automata M, there exists a regular expression notation specifying L(M). Regular expressions and the regular sets they specify are shown in the following table.
Hence, we only have three regular-expression operators: | or + to denote union operations,. for concatenation operations, and * for closure operations. The precedence of the operators in the decreasing order is: *, followed by., followed by | .
Unit-II
3. What is compiler? State various phases of a compiler and explain them in detail.
Ans. A compiler translates or compiles a program written in a high-level programming language that is suitable for human programmers into the low-level machine language that is required by computers. So, a compiler is basically a translator whose source language is a high-level programming language i.e. a problem-oriented language target language is a machine language or assembly language i.e. a machine-oriented language.
Compilation refers to the compiler's process of translating a high-level language program into a low-level language program. This process is very complex; hence, from the logical as well as an implementation point of view, it is customary to partition the compilation process into several phases.
The two parts of compilation are:
- Analysis
- Synthesis
The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.
The synthesis part constructs the desired target program from the intermediate representation.
A typical compilation, broken down into phases, is shown in Fig.
Fig. Compiler Phases
The first three phases, forms the bulk of the analysis portion of a compiler. Symbol table management and error handling, are shown interacting with the six phases.
The following are the various phases of a compiler:
- Lexical Analyzer: This is the initial part of reading and analyzing the program text: The text is read and divided into tokens, each of which corresponds to a symbol in the programming language, e.g., a variable name, keyword or number.
- Syntax Analyzer: This phase takes the list of tokens produced by the lexical analysis and arranges these in a tree-structure (called the syntax tree) that reflects the structure of the program. This phase is often called parsing.
- Semantic Analyzer: This phase analyses the syntax tree to determine if the program violates certain consistency requirements, e.g., if a variable is used but not declared or if it is used in a context that doesn’t make sense given the type of the variable, such as trying to use a boolean value as a function pointer.
- Intermediate code generation: The program is translated to a simple machine independent intermediate language. After syntax and semantic analysis, some compilers generate an explicit intermediate representation of the source program. This intermediate representation can have a variety of forms.
In three-address code, the source pgm might look like this,
temp1: = inttoreal (10)
temp2: = id3 * temp1
temp3: = id2 + temp2
id1: = temp3
- Code optimizion: An optimizer reviews the code, looking for ways to reduce the number of operations and the memory requirements. The code optimization phase attempts to improve the intermediate code, so that faster running machine codes will result. Some optimizations are trivial. There is a great variation in the amount of code optimization different compilers perform. In those that do the most, called ‘optimizing compilers’, a significant fraction of the time of the compiler is spent on this phase.
- Code Generation: The final phase of the compiler is the generation of target code, consisting normally of relocatable machine code or assembly code. Memory locations are selected for each of the variables used by the program. Then, intermediate instructions are each translated into a sequence of machine instructions that perform the same task. A crucial aspect is the assignment of variables to registers. It produces either
- Machine code for a specific machine or
- Assembly code for a specific machine and assembler.
Symbol table management
An essential function of a compiler is to record the identifiers used in the source program and collect information about various attributes of each identifier. A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the identifier. The data structure allows us to find the record for each identifier quickly and to store or retrieve data from that record quickly. When an identifier in the source program is detected by the lex analyzer, the identifier is entered into the symbol table.
Error Detection and Reporting
Each phase can encounter errors. A compiler that stops when it finds the first error is not as helpful as it could be. The syntax and semantic analysis phases usually handle a large fraction of the errors detectable by the compiler. The lexical phase can detect errors where the characters remaining in the input do not form any token of the language. Errors when the token stream violates the syntax of the language are determined by the syntax analysis phase. During semantic analysis the compiler tries to detect constructs that have the right syntactic structure but no meaning to the operation involved.
4.a) Construct the predictive parser for the following grammar:
S→ (L) | a L→ L,S | S
Ans. To built predictive parser follows the below steps:
- Remove left recursion if present.
- Remove left factoring if present.
- Find the FIRST AND FOLLOW for each grammer.
- Construct the predictive parsing table.
Step 1. Remove left recursion which is present in L.
L→ L,S | S
L→SL’
L’→ ,SL’ | ε
Step 2. No left factoring is present.
Step 3. Find the FIRST AND FOLLOW for each grammer.And the new grammer variables are
S→ (L) | a
L→SL’
L’→ ,SL’ | ε
First(S) = {( , a} Follow(S) = {$ , )}
First(L) = {( , a } Follow(L) = { ) }
First(L’) = {, , ε } Follow(L’) = { )}
Step 4. Construct the predictive parsing table.
Nonterminal / ( / a / , / ) / $S / (L) / a
L / SL / SL
L’ / ,SL’ / ε
4.b) What is FIRST AND FOLLOW? Explain in detail with an example.
Ans. Computing first and follow
These are the algorithms used to compute the first and follow sets:
FIRST
1. If X is terminal, then FIRST(X) IS {X}.
2. If X → ε is a production, then add ε to FIRST(X).
3. If X is non terminal and X → Y1, Y2..Yk is a production, then place a in FIRST(X) if for some i , a is in FIRST(Yi) , and ε is in all of FIRST(Y1),…FIRST(Yi-1);
FOLLOW
1. Place $ in FOLLOW(S), where S is the start symbol and $ is the input right end marker.
2. If there is a production A → αBβ, then everything in FIRST (β) except for ε is placed in FOLLOW (B).
3. If there is a production A → αB, or a production A→ αBβ where FIRST (β) contains ε, then everything in FOLLOW (A) is in FOLLOW (B).
4. ε will never included in FOLLOW set of any variable.
Example:
E → TA
A → +TA | ε
T → FB
B → *FB | ε
F→ (E) | id
First(E) = {(, id} Follow(E) = {$, )}
First(A) = {+, ε} Follow(A) = {$, )}
First(T) = {(, id} Follow(B) = {+, ), $}
First(B) = {*, ε} Follow(B) = {+, ), $}
First(F) = {(, id} Follow(F) = {+,),*,$}
4.b) Explain the role Lexical Analyzer and issues of Lexical Analyzer.
Ans. In the lexical analysis phase, the compiler scans the characters of the source program, one character at a time. Whenever it gets a sufficient number of characters to constitute a token of the specified language, it outputs that token. In order to perform this task, the lexical analyzer must know the keywords, identifiers, operators, delimiters, and punctuation symbols of the language to be implemented. So, when it scans the source program, it will be able to return a suitable token whenever it encounters a token lexeme. Lexeme refers to the sequence of characters in the source program that is matched by language's character patterns that specify identifiers, operators, keywords, delimiters, punctuation symbols, and so forth. Therefore, the lexical analyzer design must:
1. Specify the token of the language, and
2. Suitably recognize the tokens.
Lexical Analyzer Design
Since the function of the lexical analyzer is to scan the source program and produce a stream of tokens as output, the issues involved in the design of lexical analyzer are:
1. Identifying the tokens of the language for which the lexical analyzer is to be built, and to specify these tokens by using suitable notation, and
2. Constructing a suitable recognizer for these tokens.
Lexical analyzer has input buffer , symbol table and DFA.
Input Buffer: Initially it is blank. Sometimes lexical analyzer needs to look ahead some symbols to decide about the token to return. For example, in C language: we need to look after -, = or < to decide what token to return. We need to introduce a two buffer scheme to handle large look-aheads safely.
Therefore, the first thing that is required is to identify what the keywords are, what the operators are, and what the delimiters are. These are the tokens of the language. After identifying the tokens of the language, we must use suitable notation to specify these tokens. This notation should be compact, precise, and easy to understand. Regular expressions can be used to specify a set of strings, and a set of strings that can be specified by using regular-expression notation is called a "regular set." The token of a programming language constitutes a regular set. Hence, this regular set can be specified by using regular-expression notation. Therefore, we write regular expressions for things like operators, keywords, and identifiers. For example, the regular expressions specifying the subset of tokens of typical programming language are as follows:
operators = +| -| * |/ | mod|div
keywords = if|while|do|then
letter = a|b|c|d|....|z|A|B|C|....|Z
digit = 0|1|2|3|4|5|6|7|8|9
identifier = letter (letter|digit)*
The advantage of using regular-expression notation for specifying tokens is that when regular expressions are used, the recognizer for the tokens ends up being a DFA. Therefore, the next step is the construction of a DFA from the regular expression that specifies the tokens of the language. But the DFA is a flow-chart (graphical) representation of the lexical analyzer. Therefore, after constructing the DFA, the next step is to write a program in suitable programming language that will simulate the DFA. This program acts as a token recognizer or lexical analyzer. Therefore, we find that by using regular expressions for specifying the tokens, designing a lexical analyzer becomes a simple mechanical process that involves transforming regular expressions into finite automata and generating the program for simulating the finite automata.
Unit—III
5. a) Check Whether the following grammer is SLR(1) or not. Explain your answer with reasons.
S → L=R S → R L → *R L→ id R → L
Ans. Step 1.
First(S) = {*, id} Follow(S) = {$ }
First(L) = {*, id} Follow(L) = { =, $}
First(R) = {*, id} Follow(R) = { =, $}
Step 2. Number of production and augmented the grammer
S' –> S
S –> L = R (1)
S –> R (2)
L –> *R (3)
L –> id (4)
R –> L (5)
Step 3.
Step 4.
= / * / id / $ / S / L / R0 / S4 / S5 / 1 / 2 / 3
1 / acc
2 / S6, r5 / r5
3 / R2
4 / S4 / S5 / 8 / 7
5 / r4 / r4
6 / S4 / S5 / 8 / 9
7 / r3 / r3
8 / r5 / r5
9 / r1
In the above table, shift reduce conflict or multiple entries come in states 2.
5.b) For the operators given below, calculate the operator precedence relations and operator precedence function: Id,+,*,$
Ans. Bottom-up parsers for a large class of context-free grammars can be easily developed using operator grammars.Operator grammars have the property that no production right side is empty or has two adjacent nonterminals. This property enables the implementation of efficient operator-precedence parsers. These parser rely on the following three precedence relations:
Relation / Meaninga <· b / a yields precedence to b
a =· b / a has the same precedence as b
a ·> b / a takes precedence over b
These operator precedence relations allow to delimit the handles in the right sentential forms: <· marks the left end, =· appears in the interior of the handle, and ·> marks the right end. Let assume that between the symbols ai and ai+1 there is exactly one precedence relation. Suppose that $ is the end of the string. Then for all terminals we can write: $ <· b and b ·> $. If we remove all non-terminals and place the correct precedence relation: <·, =·, ·> between the remaining terminals, there remain strings that can be analyzed by easily developed parser. For example, the following operator precedence relations can be introduced for simple expressions:
id / + / * / $id / ·> / ·> / ·>
+ / <· / ·> / <· / ·>
* / <· / ·> / ·> / ·>
$ / <· / <· / <· / ·>
6. a) Construct a canonical parsing table for the grammer given below.
S → CC C → cC|d
Ans. Step 1.
First(S) = {c, d} Follow(S) = {$ }
First(C) = {c, d} Follow(C) = {c,d,$}
Step 2. Number of production and augmented the grammer
S' –>S
S –> CC (1)
C –> cC (2)
C –> d (3)
Step 3.
I0 S’ → .S $
S →CC $
C→ .cC c| d
C→ .d c| d
I1 S’ → S. $
I2 S →C.C $
C→ .cC $
C→ .d $
I3 C→ c.C c| d
C→ .cC c| d
C→ .d c| d
I4 C→d. c| d
I5 S →CC. $
I6 C→ c.C $
C→ .cC $
C→ .d $
I7 C→d. $
I8 C→ cC. c| d
I9 S →CC. $
Step 4.
c / d / $ / S / C0 / S3 / S4 / 1 / 2
1 / acc
2 / S6 / S7 / 5
3 / S3 / S4 / 6
4 / r3 / r3
5 / r1
6 / S6 / S7
7 / r3
8 / r2 / r2
9 / r1
Step 4.
= / * / id / $ / S / L / R0 / S4 / S5 / 1 / 2 / 3
1 / acc
2 / S6, r5 / r5
3 / R2
4 / S4 / S5 / 8 / 7
5 / r4 / r4
6 / S4 / S5 / 8 / 9
7 / r3 / r3
8 / r5 / r5
9 / r1
6.b) What is the three address code? Mention its types. How would you implement the the three address statements? Explain with suitable examples.
Ans. Three address code is a sequence of statements of the form x = y op z. Since a statement involves no more than three references, it is called a "three-address statement," and a sequence of such statements is referred to as three-address code. For example, the three-address code for the expression a + b * c + d is:
T1 = B * C
T2 = A + T2
T3 = T3 + D
Representing Three-Address Statements
Records with fields for the operators and operands can be used to represent three-address statements. It is possible to use a record structure with four fields: the first holds the operator, the next two hold the operand1 and operand2, respectively, and the last one holds the result. This representation of a three-address statement is called a "quadruple representation".
Quadruple Representation
Using quadruple representation, the three-address statement x = y op z is represented by placing op in the operator field, y in the operand1 field, z in the operand2 field, and x in the result field. The statement x = op y, where op is a unary operator, is represented by placing op in the operator field, y in the operand1 field, and x in the result field; the operand2 field is not used. A statement like param t1 is represented by placing param in the operator field and t1 in the operand1 field; neither operand2 nor the result field are used. Unconditional and conditional jump statements are represented by placing the target labels in the result field. For example, a quadruple representation of the three-address code for the statement x = (a + b) *(-c)/d is shown in Table 1. The numbers in parentheses represent the pointers to the triple structure.
Table 1: Quadruple Representation of x = (a + b) *(-c)/d
Triple Representation
The contents of the operand1, operand2, and result fields are therefore normally the pointers to the symbol records for the names represented by these fields. Hence, it becomes necessary to enter temporary names into the symbol table as they are created. This can be avoided by using the position of the statement to refer to a temporary value. If this is done, then a record structure with three fields is enough to represent the three-address statements: the first holds the operator value, and the next two holding values for the operand1 and operand2, respectively. Such a representation is called a "triple representation". The contents of the operand1 and operand2 fields are either pointers to the symbol table records, or they are pointers to records within the triple representation itself. For example, a triple representation of the three-address code for the statement x = (a+b)*(- c)/d is shown in Table 2.