Phrase-Structure Grammars and Languages
Formal Grammars and Languages
N.Chomsky, 1950’s
Theory of Generative Grammars (Phrase-structure Grammars)
The origin was for the mathematical (qualitative) study of linguistics.
It turned out in 1960’s that the theory was useful for the study of programming languages, too. Now it is one of basic theories for theoretical computer science.
Why Phrase-structure Grammar?
There are dependencies among the phrases constituting sentences.
S
NP Aux VP
Det N V NP
the boy love PN
Mary
Generation of sentences according to phrase-structure rules
S NP Aux VP (SNP Aux VP)
Det N Aux VP (NPDet N)
the NP Aux VP (Detthe)
the boy Aux VP (NPboy)
the boy VP (Aux)
the boy V NP (VPV NP)
the boy love NP (Vlove)
the boy love PN (NPPN)
the boy love Mary (PNMary)
generation by rewriting phrase-structure rules
’Grammar’ can be considered to be a set of phrase-structure rules:
Relations (dependencies) among grammatical categories:
SNP Aux VP, NPDet N | Det AP N | PN,
VPV NP | V | V PP APAdj AP | Adj,
PPPrep NP, ……
Vocabulary:
Nboy | girl | dog | …, Vlove | read | study | …,
PN Mary | Jack | …, Auxbe | will | may | …,
Detthe | a | an | …, Adjpretty | small | red | …,
Prepon | in | for | …, ……
Phrase-Structure Grammar
A rewriting system, a modification of the Thue system (1906)
G=(N,T,P,S), where
1) N is a finite alphabet (the set of nonterminals),
2) T is a finite alphabet (the set of terminals),
3) P is a finite subset of V*NV*V*, where V=NT. Each element, say (u,v), of P is called a production or rewriting rule, and written uv, and
4) S is a specified symbol of N (the sentence symbol or start symbol).
Rewriting in G (a binary relation, , on V*)
G ’ or simply ’ ( directly derives ’) if =u and ’=v for some uv in P.
is said to derive if *, where * is the reflexive and transitive closure of .
A derivation of from is *, i.e., either = or =01n= for some 1,…,n.
Recall that + and n are the transitive closure and the n-th power of , respectively.
The language G generates
L(G)={xT* | S *x}
G and G’ are said to be equivalent if L(G)=L(G’).
X1|…|n is an abbreviation of X1, …, Xn.
Four Types of Phrase-Structure Grammars
(according to the forms of productions)
G=(N,T,P,S), V=NT
type 0 (unrestricted): No restriction
type 1 (context-sensitive, CS): Each production in P is of the form X, where ,V*, XN, and V+
type 2 (context-free, CF): Each production in P is of the form X, where XN and V*
type 3 (right-linear or regular): Each production in P is of the form XaY or X, where X,YN and aT
Let ℒi denote the class of type i languages.
Examples
- Let G1=({A,B},{0,1},P1,A), where P1={A0|1B, B0B|1B|0}.
G1 generates {0}{1}{0,1}*{0} or 0+1(0+1)*0, the set of binary representations of even nonnegative integers. G1 is of type 3 or right-linear. Thus L(G1) is in ℒ3. An example of derivation in G1:
A1B10B101B1010, and therefore we have A*1010.
- G2=({S},{a,b},{SaSb|},S) generates a nonregular language AnBn. Since G2 is a type 2 (or context-free) grammar (CFG), AnBn is a type 2 (or context-free) language (CFL). As you have learned already, this is a nonregular language. In fact, AnBn is in ℒ2-ℒ3.
SaSbaaSbb…anSbnanbn for any n1. For n=0, we have
S=a0b0. Thus S*anbn for any n0.
- Let N={S,A,B,C}, T={a,b,c}, and P3={S, SA, AaABC, AaBC, CBBC, aBab, bBbb, bCbc, cCcc}. G3=(N,T,P3,S) is a type 0 grammar generating a non-context-free language (which will be proved later) AnBnCn={anbncn | n0}.
An example derivation of a2b2c2 from the start symbol:
SAaABCaaBCBCaabCBCaabBCCaabbCCaabbcC
aabbcc.
It is known that AnBnCn is in ℒ1-ℒ2.
Type 3 Languages = Regular Sets
THEOREM The class of regular sets coincides with the class of type 3 languages, that is, ℒ3=ℛ.
Proof. ℒ3ℛ: For a given type 3 grammar G=(N,T,P,S), let M=(N{f},T,,S,{f}) be the -NFA defined by
(X,a) contains Y iff XaY is in P, and
(X,) contains f iff X is in P.
Then L(M)=L(G).
ℒ3ℛ: For a DFA M=(Q,,,q0,F), define the type 3 grammar G=(Q,,P,q0) as follows:
qap if (q,a)=p, and
q for each q in F.
(A similar construction is possible for NFAs and -NFAs.)
Then L(G)=L(M).
Examples
1. Converting DFAs, NFAs, or -NFAs to type 3 grammars:
2. Converting type 3 grammars to -NFAs:
Exercises
- What is L(G) for the following G’s ?:
- Give type i grammars generating the following languages:
3. Show type 3 grammars and finite automata for the following regular sets:
1
#006