Practicum Taal- en Spraaktechnologie 2003-2004Lab: StOT with GLA in Praat

Computerpracticum: Learning Stochastic OT Grammars with GLA and Praat

Goals of Lab:

- to learn how to learn a Stochastic OT Grammar in Praat

- to learn how different parameters effect the learning stages and the final result

- to learn how to compare grammar outputs to expected outputs and check for significant differences

Praat

Praat is a free program that can be downloaded from .There is a tutorial for learning OT grammars in Praat which shows all the different features, either in the Praat manual or at .

Start the program. You should immediately see two different windows: an object window and a picture window. The object window has another window inside it an a number of gray buttons below this internal window. This window is going to show a list of all the files and other types of objects that have either been loaded into the program (with the Read command) or which have been created by the program when it has been run on some data. The picture window is where tableaux and other data will be presented.

Nocoda grammar

We’ll begin with a very simple grammar. In the object window select

New, Optimality Theory, Create Nocoda Grammar

In the list of Praat Objects OTGrammar Nocodashould appear. This means that this grammar is loaded into the program. Also, a number of new buttons will appear on the right-hand side of the internal window, and the text in the buttons below the internal window will have changed from grey to black indicating that they are now active.

To see what the grammar looks like choose Edit. A new window will pop up. This window first lists the two constraints in this grammar, Nocodaand Parse, followed by a column for the ranking value and disharmony, which is the selection point for evaluation. The constraints mean roughly the following:

NoCoda: Incur one violation mark for each syllable that ends with a coda (i.e. a consonant, thus if V = vowel and C = consonant a syllable CV will not violate this constraint, a syllable V will also not violate the constraint, but a syllable VC or CVC will violate this constraint. This constraint is a markedness constraint: it prefers less marked forms over more marked forms and its effect can be evaluated without reference to the input. Having a syllable with a coda is considered a more marked form for several reasons, i.e. it is articulatorily more difficult to pronounce, and languages that allow codas are less frequent in the languages of the world than languages that disallow codas. Note that to evaluate a markedness constraint likeNoCoda it is not necessary to know with the input is.

Question1: What is the syllabic structure of [pa] and [pat] ?

Parse: Incur one violation mark for each phoneme in the input that doesn’t get realized in the output. This is a faithfulness constraint. It relates the input form to the output form, so in order to evaluate you need both the input and the output form.

Question2 : According to the current ranking values, which constraint is the highest ranked?

Below the two constraints you will find two tableaux for two input forms, /pat/ and /pa/, and their evaluation with respect to a number of output forms and the two constraints.

Go under Editand choose Evaluate (zero noise).What happens? Now try evaluating with noise 2.0. The ranking values remain the same but the disharmony values change. Does this effect the candidate form that is chosen as the most optimal? What does noise 2.0 stand for?

Question 3: If you choose Evaluate… a window will pop up and ask you to give the amount of noise with which you want the evaluation to be done. Evaluate with a noise of 10 several times and examine the results. What happens? Why does this happen? Is a noise value of 10 at all plausible? How would it effect the output forms of the speakers?

Let’s look at the OT grammar itself. Choose Write from the Praat Objects window and then choose Write to file (make sure your OTGrammar is selected otherwise this won’t work). The program will create a file with the extension xxx.OTGrammar. Open this file and examine it (in e.g. Notepad, Word or emacs). Below is a printout of the exact same grammar. In the grammar there is first a header explaining what type of file it is, followed by information about the number of constraints. This is followed by a list of the constraints, with the name followed by the ranking value and then the last disharmony value that was processed. At the end of the file are the tableaux with the input and output data.

Question 4: How are violations and non-violations of OT constraints expressed in the grammar file?

File type = "ooTextFile"

Object class = "OTGrammar"

2 constraints

constraint [1]: "N\s{O}C\s{ODA}" 100 100.8595808321457 ! NOCODA

constraint [2]: "P\s{ARSE}" 90 90.13208528615732 ! PARSE

0 fixed rankings

2 tableaus

input [1]: "pat" 2

candidate [1]: "pa" 0 1

candidate [2]: "pat" 1 0

input [2]: "pa" 1

candidate [1]: "pa" 0 0

The NoCoda grammar won’t show variation when evaluation is done with a noise value of 2.0. This is because the distance between the two constraints is 10 units, which is considered to be nearly categorical. But Stochastic OT can also model variation, and Praat also includes and implementation of the gradual learning algorithm (GLA) and can thus be used to learn a grammar with variation. We are going to look at the Ilokano data that was discussed in the lecture and is described in Boersma and Hayes (2001). In order to replicate the learning that they did we need:

1. A grammar for the data with constraints, and information about how relevant input-output pairs are evaluated with respect to the constraints.

2. Information about how frequent different outputs are given a certain input in the language.

You can download both of these files at the following site:

After downloading you have to save each of the files as a textfile in order to be able to load them into Praat. The grammar file is reproduced below: answer the following questions:

"ooTextFile"
"OTGrammar" !Ilokano

18
"O\s{NSET}" 100 0 ! Onset
"*C\s{OMPLEX}O\s{NSET}" 100 0 ! *ComplexOnset
"*__\si_[\?gC" 100 0 ! *syll[?C
"*C\s{ODA}" 100 0 ! *Coda
"*\?g]__\si_" 100 0 ! *?Coda
"M\s{AX}(\?g)" 100 0 ! Max (?)
"M\s{AX}(V)" 100 0 ! Max (V)
"L\s{INEARITY}" 100 0 ! Linearity
"I\s{DENT}-IO(syllabic)" 100 0 ! Ident-IO (syllabic)
"M\s{AX}-OO(\?g)" 100 0 ! Max-OO (?)
"D\s{EP}(\?g)" 100 0 ! Dep (?)
"I\s{DENT}-BR(syllabic)" 100 0 ! Ident-BR (syllabic)
"M\s{AX}-BR" 100 0 ! Max-BR
"*L\s{OW} G\s{LIDE}" 100 0 ! *LowGlide
"A\s{LIGN}" 100 0 ! Align (stem, L, syll, L)
"C\s{ONTIG}" 100 0 ! Contiguity
"I\s{DENT}-IO (low)" 100 0 ! Ident-IO (low)
"I\s{DENT}-BR (long)" 100 0 ! Ident-BR (long)

0 fixed rankings

7 inputs ! ON O2 ?C CD ?] M? MV LI IS OO D? IB MB LG AL GU IW IG
"ta?o-en" 7
"taw.?en" 0 0 0 2 0 0 0 1 1 0 0 0 0 0 0 0 0 0
"ta?.wen" 0 0 0 2 1 0 0 0 1 0 0 0 0 0 0 0 0 0
"ta.wen" 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0
"ta.?en" 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
"ta.?o.en" 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"ta.?o.?en" 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
"ta.?wen" 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
"pa?lak" 3
"pa.lak" 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
"pa?.lak" 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0
"pa.?lak" 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"lab?aj" 2
"lab.?aj" 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"la.baj" 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
"trabaho" 2
"tra.ba.ho" 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"tar.ba.ho" 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
"?ajo-en" 5
"?aj.wen" 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0 0
"?a.jen" 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
"?a.jo.en" 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"?a.jo.?en" 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
"?a.jwen" 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
"HRED-bwaja" 5
"bu:.bwa.ja" 0 1 0 0 0 0 0 0 0 0 0 1 3 0 0 0 0 1
"bwaj.bwa.ja" 0 2 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0
"bub.wa.ja" 0 0 0 1 0 0 0 0 0 0 0 1 3 0 1 0 0 0
"bwa:.bwa.ja" 0 2 0 0 0 0 0 0 0 0 0 0 20 0 0 0 1
"ba:.bwa.ja" 0 1 0 0 0 0 0 0 0 0 0 0 3 0 0 1 0 1
"basa-en" 5
"ba.sa.?en" 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
"ba.sen" 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
"ba.sa.en" 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"bas.Aen" 0 0 0 2 0 0 0 0 1 0 0 0 0 1 0 0 0 0
"bas.wen" 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 1 0

Learning a grammar with variation

Read the grammar and the pair distributions into the program. Then they should both show up on your Praat Objects window. Then select both of them and then you will get a menu with just two buttons,Learnand Get Fraction Correct…A new window will pop-up asking you about the parameters for learning. Evaluation Noise is the standard deviation for the normal distribution from which the selection value (here disharmnoy0 is selected. The reranking strategy is a menu that gives you a number of possible learning strategies. Symmetric all is the default and also the standard GLA algorithm. The number of simulations is given by default as 100000. Leave this for now. Choose OK and wait. It may take a while (you are doing 100000 input output analyses with reranking as needed). When it is done the learning window will disappear. Then you can select the Grammar in the Praat objects window again and choose Edit. There you’ll see a new version of the grammar with different ranking values. Unfortunately, Praat shows you the tables for the previous ranking, and not the one you just learned. To remedy this you need to evaluate once with zero noise in order to rearrange the tables.

Question 6: How similar are the ranking values to Boersma & Hayes (2001) results, i.e. the ones given on page 18 of the paper in table (21). Will the grammar you have learned differ in the forms it generates from the one that Boersma & Hayes learned? How did the training schedule from the learning done in Boersma & Hayes (see page 35 for their training schedules ) differ from the method you just used?

Now we are going to see how well the learned grammar generates forms. Does the distribution of the outputs produced match the distribution in the pair distributions? First you need to generate forms for each input form.

Choose the grammar, then choose To output distributions and give the value as 10000. You’ll see a new distribution object appear in the object window. Choose this and then choose Draw as numbers. In the Praat picture window the input  output and the number of times it occurred for 10000 runs for each input form will appear. From these figures you can calculate whether there is a significant statistical difference between these predictions and the pair-distribution frequency data that was fed into the algorithm. That you can see by doing a 2 test.[1]

2 test

This test is one of a number of sampling distribution models that are often used in linguistics. It compares observed counts with expected counts, to see if differences are merely a matter of chance, or if there really is a difference between the two samples. Thus, with the test we look at the difference between the observed values and the expected values, i.e. Observed-Value – Expected Value. Because subtracting like this might result in a negative value, we have to work with the squared values of these differences, i.e. we will look at (Observed-Value – Expected Value)2 . Because we don’t want the differences found to be affected by the number of counts, we also need to normalize the values. We do this by dividing by the Expected Value. We then take the sum of these for each cell. Thus we have the following equation:

2 = (Obs -- Exp)2

Exp

The observed values will be the counts generated by the program. The expected values will be the values in the grammar we are attempting to approximate, i.e. the counts that would be generated by the pair-distribution ratios.

Limitations of the test

You can only calculate the 2 if you have actual counts, i.e. you cannot use it on percentages. Additionally, the test requires that every cell has 5 or more occurrences. A cell with no occurrences makes the test unusable. This means that we can only use the test to compare observed values with expected values for the input-output forms that show variation (there are two in our grammar), because with the other forms some expected outputs have zero values. Depending on how learning proceeded, you may also have zero values or values below 5 for the two input-output forms that show variation. If you use the actual counts, then you won’t be able to use the test. To work around this you can move counts to 5-0 occurrence output forms (don’t forget to change the number in the “giving” cell as well as the receiving cell so that the total counts don’t change!). This isn’t actually correct, but there’s no way to use the test otherwise. Note that some of the problems we have with employing the test have to do with the artificial nature of the observed frequencies we are using, which are an idealization. We might expect a large corpus study to give us different, non-zero values, which would mean we could compare more input-output pairs. Additionally, it seems to be a matter of chance whether learning gives zeros cells or not, so if we did more learning and then tested, the results might not be as dramatic, but we wouldn’t have this problem with the assumptions of the test not being met.

Implementing the test

You can calculate 2 by hand but its much faster (and easier!) to use an online calculator. Go to As your observed value you give the different outputs for a given input, and as the expected value you give what values youwould expect if the pair-distribution file ratios were followed exactly for generating 10000 forms. If p < 0.5 the difference is significant. Remember: we are looking for a non-significant difference.

Question 7: For which input-output pairs is the difference significant? What problems do you encounter? Why? How do solve them?

Let’s try learning some more, that is, expose the grammar to more data. Do the same thing as above, i.e. learn from 100000 examples and then once again compare the output distribution produced by your grammar with the pair-distributions given to the grammar.;

Question 8: Is there a statistically significant difference between the observed output and the expected output?

Varying the parameters for learning

Now explore what effect varying the parameters of learning has on how learning progresses. You’ll need to start each new comparison by first reloading the OTGrammar from the file, because the loaded grammar will have been altered by the program during learning and in order to compare learning schedules we have to always start with the same initial ranking.

Plasticity

On page 35 of the article Boersma and Hayes give the training schedule they used with ilokano, repeated below:

Data / Plasticity / Noise
First 7000 / 2 / 10
Second 7000 / 0.2 / 2
Last 7000 / 0.02 / 2

Instead, follow the following training schedule where plasticity is held constant at 2. Look at the grammar after each run.

Data / Plasticity / Noise
First 7000 / 2 / 10
Second 7000 / 2 / 2
Last 7000 / 2 / 2

Question 9: Is the result significantly different from the expected values?

Now try the following learning scheme:[2]

Data / Plasticity / Noise
First 7000 / 7 / 10
Second 7000 / 7 / 2
Last 7000 / 7 / 2

Question 10: Is there a difference in how the grammar is being learned? What is the explanation for this difference if you observe one? Are the values produced significantly different from the pair-distribution data? What conclusion can you make about the value of

plasticity for learning?

Noise

What effect does the noise value have on learning? Boersma and Hayes begin with a high noise value and then settle on 10. Relearn with their original schedule, but keep the Noise value static at 10, i.e. follow the following schedule:

Data / Plasticity / Noise
First 7000 / 2 / 10
Second 7000 / 0.2 / 10
Last 7000 / 0.02 / 10

Question 11: How does the grammar look? Are the forms it generates significantly different from the forms in the pair-distributions? Give an explanation for the results you find. If you continue learning with the same noise level, but with 21,000 inputs evaluated (i.e. doubling the amount of data) what is the result then? Why?

[1]2 is pronounce like “ky” in “sky”, i.e. the “ky-square test”.

[2] Don’t forget to reload the grammar!