General Information About List

ReadMeforGSL.doc

Notes relating to organisation of West: A General Service List of English Words (GSL) (compiled and edited by Michael West, Longman 1953)

General information about list

Column A: Line numbering. Consecutively numbered lines according to the order in the GSL.

Column B: GSL head. Head as given in the GSL, i.e. the headword category under which a particular Alphabetical head (Column D) appears. This may be different from the Alphabetical head (Column D): e.g. the word (i.e. alphabetical head) ‘storeroom’ appears under the GSL head ‘store’.

Column C: McArthur category. Semantic-field category or categories of word, according to categories given in Longman lexicon of contemporary English by Tom McCarthur (Longman 1981), except [X] = not classifiable according to McArthur categories; [Y] = headword, therefore not word in sense. This information is not in the original GSL. It has been included to allow material to be organised according to semantic fields.

Column D: Alphabetical head. This is the head for sorting purposes, i.e. the specific word whose frequency is being calculated.

Column E: Part of speech. Note that where this column has x, this means either that (i) the part of speech is undetermined, or (ii) that this is not a part of speech, but that the overall record records an overall word frequency (as opposed to a word-in-sense-frequency). In this latter case, Column G has a percentage score of 100%.

Column F: Word count 1. Score as given in GSL, with additional information, as indicated by final e after number. In some cases the GSL does not give a word score. These are noted as X in this column.

Column G: Word count 2. ‘Raw’ word count, without additional information (i.e. letters such as e not included after number).

Column H: Percentage. In most cases this refers to the percentage of occurrence of a particular sense of a word. Where the score is 100%, however, it either (i) refers to the fact that this is a word-frequency record (i.e. containing 100% in this column, and x in Column E), or (ii) refers to the fact that this is the sole sense of this word given in the GSL; i.e. this single sense covers 100% of occurrences. NOTE: IN ORDER TO EXTRACT HEADWORDS FROM THIS LIST, simply ‘match’ 100% in this column. This will give a mixture of records of type (i) and type (ii) immediately above.For reasons of economy where a word has only a single sense covering 100% of occurrences, this has been treated as both the headword and the word-in-sense frequency record (i.e. these words have no separate overall headword record).

Column I: Word-in-sense frequency. The frequency of the word in this sense per 5 million occurrences (i.e. the Column G word count 2 score, multiplied by the Column H Percentage score). Where there is no word count for Column G, I have either given a notional score of 5,000,000 in this column, where I think a word in a particular sense is extremely common, or of 0, where I think it is not common. This causes all words-in-senses of unknown frequency which are deemed to be common to be sorted at the beginning of the list when a sort is carried out on the word-in-sense Word-in-sense Frequency column, and all words-in-senses of unknown frequency which are deemed not to be common to be sorted at the end of the list.