8
Statistical Studies of Natural Syntactic Development:
An On-going KISS Project
Dr. Ed Vavra, the Developer of KISS Grammar
El Greco’s The Fifth Seal of the Apocalypse or The Vision of Saint John (1608–1614)
(I love El Greco’s elongated figures.)
© 2014
Last updated, October 19, 2015
Introduction 2
Interested in Helping? 3
Individual Studies 3
Students’ Writing 4
Grade 3 6
Grade 4 6
Grade 5 7
Grade 6 7
Grade 7 7
Grade 8 7
Grade 9 7
Grade 10 8
Grade 11 8
Grade 12 8
In-class Writing of College Freshmen 8
Oral Language 9
Lewis, B. Roland (Benjamin Roland), Contemporary one-act plays. 9
Professional Writers—Children 9
Potter, Beatrix 9
Burgess, Thornton 9
George Macdonald, At the Back of the North Wind 10
Professional Writers—Young Adult 10
Henty, George A. 10
Alcott, Louisa May 10
Professional Writers—Adult 10
The Brontes 10
The Openings of Six Major Novels 10
Modern Essays. Selected by Christopher Morley 11
Previous Studies upon Which This Project Builds 12
Hunt’s Studies 13
Grammatical Structures Written at Three Grade Levels. (1965) 13
“Early Blooming and Late Blooming Syntactic Structures” (1977) 26
The Horse-Race Studies 28
Introduction
As explained at the end of this book, this project is heavily indebted to previous studies done in the 60’s and 70’s, primarily by Kellogg Hunt, Roy O’Donnell, and Walter Loban. These studies convincingly demonstrated that statistical analysis can provide useful information about how sentences grow. Unfortunately, these studies were rarely read, and their conclusions were misinterpreted into a horse-race failed competition to make students write longer sentences. As a result, to my knowledge little has been done to further their work. This project is an attempt to do so.
For the information about the statistical analysis and the analytical codes, see “The ‘Style Machine’ and its Codes.” As I explain in that document, I cannot share the “Style Machine,” but I have transferred much of the information that comes from it to Excel worksheets. Click here to get them. The first worksheet is a summary of the data in the others. The tabs at the bottom will take you to individual information on the stats of the documents that have been analyzed.
This document primarily includes links to the databooks for individual studies. Perhaps I should note that these samples were originally collected to make KISS instructional exercises for different grade levels based on the writing of students in those grades. In working with these, I realized that the samples reinforce—and go beyond—the studies done in the 60’s and 70’s. The number of texts analyzed here comes close to the number analyzed by any of those researchers--and they had a lot of help. One could spend a lifetime on this statistical work, but my primary focus is the practical application—the KISS Instructional Workbooks.
The section (below) on “Previous Studies” is fairly technical, but it does suggest many of the problems with statistical studies of natural syntactic development—and how KISS addresses them.
Interested in Helping?
Finding, transcribing, and doing the statistical analysis take substantial time. If you would like to volunteer to help, see “Links to Samples of Students’ Writing.”
Individual Studies
One of the major problems with the original studies is that the texts that were analyzed were not made available for verification and further study. In an attempt to avoid this problem, in 1986 I was able to get permission to use samples of the in-class writing of fourth, seventh, eighth, and ninth graders. These are labeled “1986,” for example, “G04 1986.” To find additional data, I have requested permission to use samples from various states’ assessment documents. These are identified by the state abbreviation, for example, “G03 AZ2001.” Samples from professional writing are taken from various public domain texts.
Following the researchers from the 70’s sample sizes from professional writers generally consist of the first 250 words of the text to the end of a sentence. Samples from students’ writing, however, include the entire piece. I decided on this because there appears to be a correlation between the amount of text written and the complexity of the sentence structures.
Two final notes. First, we need to keep in mind a distinction made by Ferdinand de Saussure, who is often considered the father of modern linguistics. De Saussure made a distinction between competence (what a person can do) and performance (what is actually done in specific cases). Statistical studies like ours attempt to determine competence, but do so by measuring specific performances. These are usually single passages by different writers. For example, that a writer (or group of writers) does not use appositives in these samples does not mean that they are not fully capable of so doing. To determine the competence of a specific writer we would need at a minimum dozens of examples of her or his writing.
Second, the results that we currently have do, I would argue, clearly indicate specific trends in natural syntactic development, but graphs by the grade levels have downs as well as ups. The assumption is that as more samples of the writing of students at each grade level are added, the bumps in graphs would slowly decrease.
Students’ Writing
As noted above, the samples of students’ writing come from two sources, each of which has its advantages and disadvantages.
For our purposes, the samples from state standards have two primary advantages. First, they are evaluated (scored) by the state’s Department of Education. They were chosen as good examples of strong to weak examples of student’s writing, but the collectors had no idea that the samples would be evaluated for sentence structure. Second, they include a very wide range of writing quality—from a state-wide set of examples.
But that creates one of their disadvantages. In essence, they are a flat presentation of a bell curve—the middle is no more represented than are the extremes. Interestingly, however, thus far in the statistical analysis, the results generally fall into their expected place in the overall study. Perhaps the top and the bottom average out to the middle. Their other major disadvantage is that the states appear to choose different things to include—how many good papers, how many poor, and how weak. (Pennsylvania, for example, sometimes includes two or more “non-scorable” examples.)
The primary advantage of the samples (like those from 1986) of students in specific classes is that they are more likely to represent what a teacher in a classroom may be facing. In essence, the high and low from the state standards documents will probably be absent. The disadvantage of these samples is that it is difficult to tell how much pre-writing and other help that students received before they wrote the papers.
For each of the databooks linked to below, I have provided a quick table here for some of the basic data from the spreadsheet.
“W/MC” = words per main clause. This is basically Hunt’s “T-Unit.” (For more on this see “Previous Studies” below.)
“TSC/MC” = the total number of subordinate clauses per 100 main clauses. I have included the numbers from the studies of Hunt, O’Donnell, and Loban.
“L1/MC” through “L4/MC” = the number of subordinate clauses per 100 main clauses embedded at different levels. The previous researchers did not give this much attention, but it is definitely relevant to natural syntactic development, especially since the data indicates that many weak writers have a high, but unwieldy level of embedding. A subordinate clause embedded in a main clause is Level 1, a clause embedded in that clause is level 2, etc. The following example was written by an eleventh grader:
And I would also change the justice system in the court room [Adv. L1 because a lot of girls think [DO L2 that they can get away with abuse [Adv. L3 because they think [DO L4 that guys are more powerful]]]] /R/ that is a lie . . . [Note that it is followed by a run-on.]
Professional writers rarely go four levels deep.
“Give/MC” = gerundives per 100 main clauses. A gerundive is a participle that functions as an adjective, for example, “Watching the ballgame, he forgot about the pizza in the oven.” Hunt claims, with some justification, that these are “late-blooming.” See below. Gerundives raise a number of problems that cannot be discussed in this overview.
“App/MC” = appositives per 100 main clauses. Hunt also claims, with some justification, that these are late-blooming. Some people claim to see appositives in the writing of very young students, but, as O’Donnell suggested, these are better seen as “lists”—“There are four people in our house, my father, my mother, my brother, and me.”
“PPA/MC” = post-positioned adjectives per 100 main clauses. Like appositives and gerundives, these probably develop as reductions of main clauses. “They saw nothing [that is wrong with [what he did]],” becomes “They saw nothing wrong with [what he did].”
Grade 3
Source / W/MC / TSC/MC / L1/MC / L2/MC / L3/MC / L4/MC / Give/MC / App/MC / PPA/MCAZ2001 / 6.0 / 13.3 / 13.1 / 0 / 0 / 0 / 0 / 0 / 0
Oregon / 9.8 / 40.5 / 36.8 / 2.5 / 1.2 / 0 / 0.2 / 6.1 / 0.6
Loban / 7.6
O’Donnell / 7.7 / 18
Five samples from the 2001 Student Guide for Arizona’s Instrument to Measure Standards [Get Databook.]
Although this is a very small sample size, these are interesting (as are the other the other 2001 samples from Arizona) in that the students who scored highest for “Sentence Fluency” and “Conventions” also scored highest for “Content,” “Organization,” “Voice,” and “Word Choice.” In the samples from some states, this is difficult to determine, but the implication may be that students who cannot control sentence fluency and conventions have more trouble communicating the content, organization, and voice that are in their heads as they write.
Fifteen samples from Oregon. [Get Databook.] For comments on the differences between the Arizona and Oregon samples, see the Databook.
Grade 4
Source / W/MC / TSC/MC / L1/MC / L2/MC / L3/MC / L4/MC / Give/MC / App/MC / PPA/MC1986 / 7.7 / 22 / 20.2 / 1.8 / 0 / 0 / 0.9 / 5.0 / 0
Loban / 8.0 / 19
Hunt / 8.5 / 29
Ten samples from the 1986 study. [Get Databook.]
Grade 5
Source / W/MC / TSC/MC / L1/MC / L2/MC / L3/MC / L4/MC / Give/MC / App/MC / PPA/MCAZ2001 / 9.0 / 30 / 23.1 / 6.9 / 0 / 0 / 2.0 / 0.7 / 0
Loban / 8.8 / 21
O’Donnell / 9.3 / 27
Four samples from Arizona. [Get Databook.] Each sample is given in its unedited form, followed by the evaluation from the Arizona DoE. These are followed by the statistical analysis key, then by an edited version, and a typical analysis key for KISS exercises. The latter includes explanations.
Grade 6
Source / W/MC / TSC/MC / L1/MC / L2/MC / L3/MC / L4/MC / Give/MC / App/MC / PPA/MCPA2000 / 9.8 / 58 / 42.2 / 12.0 / 3.6 / 0 / 1.4 / 1.8 / 0.7
Loban / 9.0 / 29
Twenty-nine samples from the Pennsylvania 2000-2001 Writing Assessment Handbook Supplement. [Get Databook.] These include responses for two prompts, and are scored for “Focus,” “Content,” “Organization,” “Style,” and “Conventions.” The numbers for subordinate clauses in the PA study are high because one weak writer has 4 subordinate clauses for every main clause.
Grade 7
Source / W/MC / TSC/MC / L1/MC / L2/MC / L3/MC / L4/MC / Give/MC / App/MC / PPA/MC1986 / 9.4 / 42 / 35.0 / 6.1 / 0.4 / 0 / 1.9 / 3.9 / 0.4
Loban / 8.9 / 28
O’Donnell / 10.0 / 30
Thirty-one samples from the 1986 collection. [Get Databook.]
Grade 8
Source / W/MC / TSC/MC / L1/MC / L2/MC / L3/MC / L4/MC / Give/MC / App/MC / PPA/MCAZ2001 / 11.3 / 54 / 45.0 / 7.8 / 1.1 / 0 / 0 / 3.2 / 0
Loban / 10.4 / 50
Hunt / 11.3 / 42
Four samples from the 2001 Student Guide for Arizona’s Instrument to Measure Standards. [Get Databook.]
Grade 9
Source / W/MC / TSC/MC / L1/MC / L2/MC / L3/MC / L4/MC / Give/MC / App/MC / PPA/MCPA2001 / 13.2 / 71 / 55.8 / 13.2 / 1.7 / 0 / 2.1 / 2.9 / 0.9
Loban / 10.1 / 47
Forty-one samples from the 2000-2001 Pennsylvania Assessment Guide. [Get Databook.]
Grade 10
Source / W/MC / TSC/MC / L1/MC / L2/MC / L3/MC / L4/MC / Give/MC / App/MC / PPA/MCMA2010 / 14.9 / 58 / 46.9 / 11.2 / 0.2 / 0 / 6.9 / 13.8 / 0.9
“ / 19.3 / 109 / 82.8 / 24.7 / 1.7 / 0 / 4.3 / 1.1 / 0.9
Loban / 11.8 / 52
Eleven samples from the Massachusetts 2010 writing samples, plus twenty-four analyzed samples that involve responding to a text. Because the latter samples include quoting and paraphrasing from a text, they are a separate sub-study. The yellow row below MA2010 indicates the difference in statistical results—and why they are not counted in the general study. [Get Databook.] Because each of these samples is a separate web page, I collected them in a databook. [Click here.]
Grade 11
Source / W/MC / TSC/MC / L1/MC / L2/MC / L3/MC / L4/MC / Give/MC / App/MC / PPA/MCPA2001 / 14.4 / 79 / 62.7 / 13.3 / 2.5 / 0.4 / 2.3 / 1.7 / 0.7
Loban / 10.7 / 45
Thirty-eight samples from the 2000-2001 Pennsylvania Assessment Guide. [Get Databook.]
Grade 12
Source / W/MC / TSC/MC / L1/MC / L2/MC / L3/MC / L4/MC / Give/MC / App/MC / PPA/MCLoban / 13.3 / 60
Hunt / 14.4 / 68
Currently empty
