CS626 NLP, Speech and the Web End Sem

Date: 21/11/12 (90 marks) Time: 9.30AM-12.30PM

  1. Answer as asked for (justifications NOT needed):
  2. “India has the strongest cricket team in the world. No other team comes even close to India”. The summary of this text is produced by a computer as “India has absolute supremacy in cricket”. This is an example of what kind of summary? Extractive or abstractive?

Abstractive (the summary is not composed of sentences from the document)

  1. A man with brain injury utters “I…rice…want to eat…it fall…hungry”. Which language processing area of his brain is malfunctioning?

Broca's area (Do not confuse with Wernicke’s area which is used for meaning or semantics. Broca’s area is used for syntax or form.)

  1. In smoothing, the technique of taking away an amount of probability mass from some n-gram and giving the amount to n-grams with zero count is called ______. Fill in the blank.

discounting

  1. “It was not only I who did not like the movie, but even the very tolerant ones were hard put to find good words for the film”. Identify the words/word-groups that are valence shifters in this sentiment bearing sentence.

‘not’ (in did not like), ‘hard put to find’. Valence shifters change the polarity of the total text by reversing the effect of sentiment bearing words. Thus ‘not’ nullifies the effect of like, and ‘hard put to find’ nullifies the effect of good.

  1. Phishnet starts the protection operation as soon as the phishing mail enters the inbox. True or False?

False. It starts before the mail enters the inbox.

  1. Give the RDF graph representation of the statement “Bill Gates is rich” (do not worry about the exact URI and syntax, but give the essential nodes and arcs).

Nouns correspond to URIs. They have to be linked by an appropriate predicate.

  1. Suppose a text T is S-V1-O and the hypothesis H is S-V2-O, where the subject S and object O are the same in both T and H. What is the relationship between V1 and V2 (think of semantics relations), so that T necessarily entails H?

V1 must entail V2. For example, in troponymy.

‘Ram devours the cake’ entails ‘Ram eats the cake’ because ‘devour’ entails ‘eat’.

  1. Give an example of usage of wordnet in a question answering system (e.g., in IBM Watson).

Given the question: “What weapon was used to kill Kennedy?” and if the knowledge base contains “Kennedy was killed by a gun shot.”, then using Wordnet hypernymy, 'gun is-a weapon', the question can be answered.

  1. “Question: What is the difference between leaves and cars? Answer: One you brush and rake, the other you rush and brake.” Write another pair of question answer like this (use ANY language you are comfortable with).

Question: What is the difference between God and a pushing leopard?

Answer: One is a loving shepherd, other is a shoving leopard.

The idea is that a computer program should be able to produce such pairs by interchanging the sound of coda with the sound of rhyme. After interchanging, the words so formed must be checked for validity in the dictionary.

  1. Besides ‘pitch’ give two other features that facilitate automatic gender classification by speech, using machine learning techniques.

Short-time Auto correlation (STAC), Short-Time Average Magnitude (STAM)

10 X 2= 20

  1. The MLE expression was derived in the class, where X is the observed data, Z is the unobserved data, N is the number of observations, M is the number of ‘sources’ (e.g. number of coins), K is the number of outcomes from each source (e.g, 2 per coin). Now in crystal clear steps:
  2. derive expressions for πjs and Pjks, using the standard maximization procedure through Lagrange multipliers. Also give expressions for E(zij)s. (4+3+3=10)

M-step calculations

Applying constraint (1)

Expression for pjk

Applying constraint (2)

E-step calculations

  1. Use the expressions obtained in a. to find the alignment probabilities for a given parallel corpora of entities. That is, given,

E1, E2, E3, …, ES in correspondence with F1, F2, F3, …, FT

(Eis and Fjs can, for example, be words in two languages)

the probability of mapping Ei with Fj has to be found. (10)

Consider the S words E1, E2, E3, …, ES as S sources.

Let U be the set of unique words from F1, F2, F3, …, FT

The problem is now that of S sources each of which takes from U possible values. Thus the probability values are pjk following the same procedure as in a.

  1. Use the expressions in a. to derive the formulae of Baum-Welch algorithm for training an HMM. (10)

In this case, the output sequence of length N takes the place of X. If there are M arcs in the HMM, they take the place of M sources. If there are K alphabets, they take the place of K outcomes. And pjks give the required probability values. zijs act like indicator variables.

  1. Do as directed:
  2. If you close your nose and try to pronounce the following words, how will they sound to the listener? Explain (no marks without the explanation; also no marks for unclear and long winded answer):

Sing, Finger, Tinted, Gambit, Laundry, Pamper, Sanjay, Kandahar, Chanchal, Panther (10 X 1= 10)

Sol:-

Since the nasal pathway is blocked, the vocal tract through the mouth will be used. The place of articulation will be maintained with the option of choosing a voiceless or voiced stop. The consonant following the nasal will decide the replacement. A technical answer is expected making use of phonetic principles. The changes in the words pronunciation are shown using arpabet phonemes.

Sing à [s ih ng]à [s ih k] (velar) (voiceless)

Finger à [f ih ng g er] à [f ih g g er] (velar) (voiced)

Tinted à [t ih n t ih d] à [t ih t t ih d] (alveolar) (voiceless)

Gambit à [g ae m b ih t] à [g ae b b ih t] (bilabial) (voiced)

Laundry à [l ao n d r iy] à [l ao d d r iy] (alveolar) (voiced)

Pamper à [p ae m p er] à [p ae p p er] (bilabial) (voiceless)

Sanjay à [s ae n jh ay] à [s ae jh jh ay] (palatal) (voiced)

Kandahar à[k ae n d ah hh aa r] à [k ae d d ah hh aa r ](dental) (voiced)

Chanchal à [ch a n ch a l] à [ch a ch ch a l] (palatal) (voiceless)

Panther à [p ae n th er] à [p ae th th er] (dental) (voiceless)

  1. A toothless person tries to pronounce the following words. How will they sound to the listener? Explain (no marks without the explanation; also no marks for unclear and long winded answer)

Therapy, Brother, fill, Samovar (4 X 1.5= 6)

Sol:-

If the place of articulation is dental, the tongue touches the teeth. Since the tooth is absent, the tongue tries to stop the airflow by touching the alveola or closing the lips. So articulation of dental type is replaced with articulation of labio-dental or alveolar. Hence “th” becomes “t” or “ph”.

Since “f” and “v” are fricative in “fill” and “samovar”, the lower lip tries to block the teeth. In the absence of teeth, the closest approximation, vis bilabial aspirated articulation is produced. So “f” becomes “ph” and “v” becomes “bh”.

·  Therapy à [th eh r ah p iy](dental)(voiceless) à [t eh r ah p iy](alveolar)(voiceless)/[ ph eh r ah p iy]( aspirated bilabial)(voiceless)

·  Brother à[b r ah dh er](dental)(voiced)à[b r ah d er(voiced)| b r ah t er(voiced)]/(alveolar)/ [b r ah ph er]( aspirated bilabial)(voiced)

·  Fill à [f ih l](labio-dental)(voiceless) à [ph ih l](aspirated bilabial)(voiceless)

·  Samovar à [s ae m ow v aa r](labio-dental)(voiced) à[s ae m ow bh aa r]( aspirated bilabial)(voiced) à samobhar

  1. “To understand the rise of Statistical NLP, one needs to look at speech processing, and see why speech is so data driven”. Elaborate on this with insightful discussions. Examples are a must. (8)

Speech recognition initially adopted the model driven approach whereby rules are created using principles of phonetics and phonology. However the phonetic and phonological phenomena are too many giving rise to impossibly large number of rules. Moreover, it is difficult to have a hundred percent accurate rule except in highly simplified situations. It is typical to have false positives and false negatives for most rules.

This situation caused the speech community to turn to the data.

where S is the speech signal and T is text. P(T) is the language model and P(S|T) is the acoustic model. P(T) elegantly captures how probable the output text is, unconditionally. The complexities of phonetics and phonology are attempted to be capsuled in one shot in P(S|T) which is a representation for complex phonological phenomena without knowing what they actually are, why they occur and how to handle them.

In NLP too there are many situations which are similar to the above. For example, the task of POS tagging if attempted through rules, can quickly go out of hand. A rule like “After an adjective, comes a noun” can have many exceptions. So NLP too took a data driven approach at least at the lower layers the inspiration for this came from speech especially through the work of Frederick Jelinek, who applied argmax computation to POS tagging and showed the effectiveness of HMM for this problem.

Speech has thus been the harbinger of statistical NLP.

  1. Do as directed:
  2. Draw the correct parse tree for the sentences: Buffalo buffaloes Buffalo buffaloes buffalo buffalo Buffalo buffaloes. (6)

Soln:-

S

NP VP

NP Sbar VB NP

NNP NNS NP VB NNP NNS

NNP NNS

  1. A probabilistic parser produces wrongly the following parse structure. Try to explain this behaviour of the parser, considering language phenomena and their frequency in the training corpus. Be very thorough and insightful. (10)

(S

(NP (NNP Buffalo) (NNP buffaloes) (NNP Buffalo))

(VP (VBZ buffaloes)

(NP (JJ buffalo) (JJ buffalo) (NNP Buffalo)

(NNS buffaloes)))

(. .))

S

NP VP

NNP NNP NNP VBZ NP

Buffalo buffaloes Buffalo buffaloes JJ JJ NNP NNS

Buffalo buffalo Buffalo buffaloes

Soln:-

This output is from the Stanford parser. There are probabilistic parsers which omit the verb completely for this sentence!

Following facts may be kept in mind to analyse the behaviour of the probabilistic parser as depicted in the current output:

a)  “Buffalo” is most likely either NNP (proper noun) or NN in sentence beginning position. We cannot say which evidence is stronger. However since probabilistic parsers are almost always trained on PENN Treebank which is nothing but the bracketed wall street journal corpora which in turn is financial domain corpus, “Buffalo” is most likely the place “Buffalo” near Niagara.

That is why we see that the POS tag for “Buffalo” is all correct as NNP.

b)  “buffalo” is unlikely to have any evidence of being used as a verb in training corpus . This explains why neither of the two “buffalo” have been given any verb tag.

c)  It is difficult to explain why the “buffaloes” which is in the fourth position gets a VBZ tag most likely this is caused by the coercion of English plurals with third person singular number present tense verb forms.

d)  The second NP with JJ JJ NP NNS is like the “large black Buffalo buffaloes” and comes from a set of rules of the form

NP à ADJP NP

NP à NP NNS | NNP

ADJP à JJ ADJP | JJ

NP NNS combination preceded by a string of adjectives is quite common in texts.

e)  The first NP composed of three NNPs is most likely due to multiword combinations. Financial corpora are likely to have such multi-words, for example “European Economic Council”.

Information needed for Q3.

Above: consonant table and vocal tract diagram

======Paper Ends======