CPSC 503

Assignment #4

Assigned: Mar14

Due Date: Mar22 (11pm)

The exercises included in this assignment total to 100 points.

Please:

- send me an email with your answers in one file in txt format

(by Mar 22 (11pm))

- leave a copy on your answers in my mailbox (by noonMar 23)

- work on this alone

- no plagiarism!

- do not delay working on the assignment until the last minute.

- if you have any clarification question, post it on the course mailing

list

Note: All files mentioned are available at the course web page

(1)CFG (10pts):

Consider the following CFG G with start symbol S that can generate the question: what does

Calvin like and also the statement: Calvin likes ice-cream (among the other sentences it can generate)

S -> WHNP does S

S -> NP VP

VP -> V

VP -> V NP

WHNP -> what

NP -> Calvin | ice-cream

V -> like | likes

Replace the rule VP -> V in the CFG G with the two new rules shown below.

VP/NP ->V NP/NP

NP/NP ->ε

These rules introduce two new non-terminals VP/NP and NP/NP. Modify the rest of the CFG G so that it can continue to generate the two sentences:

what does Calvin like and Calvin likes ice-cream.

Provide the new CFG.

(2)Mystery code: I have found twoPerl files, one containing some complex code and one containing some sort of simple toy grammar: mystery-code.pl and grammar.pl. I have also discovered that when I give the grammar as an argument to mystery I get some sort of output….

a.(50pts)

For this question your task is to understand the code in themystery-code.pl file (what it does and why). Once you have identified what this code actually implements, please, add to the files as many comments as you feel are necessary so that someone else with little Perl experience can look at it and quickly understand it. As you know, comments should focus on important/cryptic aspects of the code (e.g., what the key data structuresare, what they are used for, what a key loop/recursion does).

Furthermore, add to the code print statements so that someone running the code can get a pretty good idea of what is going on (e.g., whenever a key data structure is changed print info about what part of the structure is changed and what value is assigned)

Submit filemystery-code.plwith your comments and print statements.

a.(40pts)

Now let’s focus on the grammar.plfile. As you can see, all the probabilities were assigned assuming a uniform distribution on the rules for each non-terminal. Well, it is your lucky day and you find on the Web a silly corpus that will allow you to get some “better” estimates for your probabilistic CF grammar (see file silly-corpus).

-Compute estimates for the grammar probabilities based on silly-corpus. The corpus is so small that you can do it by hand, but if you are looking for extra-points you can implement a Perl script that would do it for you on a less-silly-corpus.

-Now given the new grammar compute the probability of the most likely parse for each of the following two sentences: John plays soccer at school and John plays

-Finally compute the same two probabilities with the old grammar. Are the results from the old and the new grammar different? Please, explain why this is or is not the case for each of the two sentences.