Poster presentation at the Statistical Society of Canada Conference Halifax, 7-12 June 2003

A Stylometric Analysis of King Alfred’s Literary Works

Paramjit Gill

Department of Mathematics & Statistics

and

Michael Treschow

Department of English

Okanagan University College, Kelowna, BC

King Alfred the Great (848-899)

Abstract

For many centuries Alfred the Great was judged to have translated several Latin texts into Old English.

Many scholars, however, have expressed doubt whether Alfred could have done all this work. With the

availability of Old English Corpus in electronic form, it is feasible to subject the texts to statistical

stylometric analysis. We use multivariate techniques for an exploratory analysis of use of “function”

words in various Alfredian and non-Alfredian texts. We find that three translations (Pastoral Care, The

Consolation of Philosophy and The Soliloquies) that have been attributed to Alfred, indeed cluster

together on the frequency of usage of function words. However, one translation still attributed to him,

The First Fifty Prose Psalms, tends to stay away from the Alfredian texts.

Introduction

After King Alfred the Great defeated the Vikings at the battle of Edington in 878 he turned to

strengthening his English kingdom of Wessex that had suffered so greatly under the Viking invasions.

Most famous is his program for educational reform. Alfred depicts himself as a philosopher-king, taking

up scholarship in his own right and making English translations of Latin patristic texts to serve as the

basis of education in the English language. Seven translations are associated with his reign. The following

three internally identify themselves as Alfred’s work:

1. Gregory the Great's Pastoral Care

2. Boethius's The Consolation of Philosophy

3. Augustine's The Soliloquies

The other four translations are:

4. Gregory the Great's Dialogues

5.Bede's Ecclesiastical History of the English People

6. Orosius's Histories against the Pagans

7. The First Fifty Prose Psalms

Of these four only Gregory’s Dialogues clearly identifies itself as not Alfred's work. Alfred himself

wrote its preface explaining that he directed his friends to make it. The other three do not identify any

translator, but tradition long held that they were the work of King Alfred. William of Malmesbury, a

twelfth century historian, listed Bede's History and Orosius's History among Alfred's translations and also

stated that Alfred was working on a translation of the Psalms at the time of his death. Old English

scholars, however, have come to accept that Alfred could not have translated Bede's History because, like

the translation of the Dialogues, it shows traces of the Mercian dialect. Alfred's authorship of the Orosius

has also recently been overthrown. Bately (1982) however, has argued that the translation of the First

Fifty Prose Psalms is Alfred's.

Bately assessed the authorship of the Orosius and the Prose Psalms by analysing how they translated

certain Latin words. She noted that the Prose Psalms usually used the same Old English words to translate

corresponding Latin words as did the three Alfredian texts, but that the Orosius showed greater

differentiation. Stylometry allows for a much more refined and extensive analysis, not only of contextual

words but also, and more importantly, of non-contextual words. The question now arises whether a more

thorough stylometric analysis would confirm Bately's conclusions.

As we are dealing with Old English translations from the original Latin, we face a special challenge that

does not arise in standard stylometric analysis where the problem is the authorship assignment of original

work. It is important to note, however, that the work of translation is itself a kind of authorship that can be

subjected to stylistic analysis. The translations considered in this study all stand at the beginning of Old

English prose writing. They show the initial development of English prose style. The proem to the

translation of Boethius states that Alfred's strategy of translation was variable, sometimes rendering

"word for word, sometimes sense for sense." All these translations exhibit an authorial voice that forms

the text into the Old English language.

Data

Theraw data for this study were generated through the Dictionary of Old English Corpus available in CD format from the University of Toronto. We copied the seven documents in ordinary text along with various tags (line numbers etc.). We divided the texts into blocks of about 50 lines, each accounting for about 1200 words on average. These blocks are the unit of statistical analysis.

Table 1. Sizes of the Texts


Function Words

An underlying principle of statistical stylometric analysis is that writers use some common high-frequency words unreflectively in their writing. These words are called function words, and occur regardless of context. They can be prepositions, conjunctions, articles, and common verbs. Different authors, however, use them at different rates. Therefore, stylometric analysis can exploit differential rates of function words to distinguish authorship.

For analysing the seven texts, we generated a list of the 100 most frequent words common to all seven texts. We refined the list by omitting all contextual words. We further omitted all words that might depend on the original Latin text and chose those words that were distinctively English and expressive of English style. Table 2 shows the list of the 17 individual function words (with the modern English meaning in parentheses) that we used for stylometric analysis. Multiple spellings for many of these words were accounted for and combined, as in the case of ÞEAH, ÐEAH, ÞÆAH, ÐÆAH, ÞEH, ÐEH.

Table 2. Function Words

______

AC (but) AND (and) BIÐ (is) EAC (also) HIT (it) IS (is) MIÐ (with) OF (of) SWA (so) TO (to) ÐA (those, then) ÐÆS (of the) ÐÆT (that) WÆS (was) WIÐ (against) ÐONNE (then)

ÐEAH (although)

We used WordSmith Tools (Scott, 1998) to count the frequency at which these function words occur in text blocks. The count was then converted to frequency per 100 words in the block. Our dataset then consists of 284 rows of text blocks and 17 columns of function words. As we see in Table 3, there is a wide variation in the frequencies of the words over the seven texts.


Table 3. Mean Frequency of Function Words in 7 Texts

Principal Component Analysis

The first five principal components (PC’s) explain about 82% of the variability and the most prominent function words in these PC’s are the 10 words: AND, HIT, IS, MIÐ, SWA, TO, ÐA, ÐÆT, WÆS, ÐONNE. More importantly, the first two PC’s clearly show the separation of Alfred’s work from Bede, Gregory’s Dialogues and Orosius (Fig 2). The most interesting revelation from Figure 2 is that Prose Psalms stay away from Alfredian texts. This casts a doubt on Bately’s conclusion that Prose Psalms are Alfred’s translation. Of course, we need more detailed confirmatory statistical analysis to investigate it further.

Fig 1. Factor Loadings for the Most Prominent Words

Fig 2. First two Principal Components

Cluster Analysis

To get an idea about the closeness of usage of function words in various texts, we ran a cluster analysis on data on 17 function words. When asked to produce three clusters, the 284 text blocks were divided as shown in Table 4. We see that most of the text blocks from Boethius, Pastoral Care and Soliloquies cluster together (cluster 3) and majority of Bede, Gregory’s Dialogues and Orosius blocks go to cluster 2. Cluster analysis confirms our suspicion about the Prose Psalms with all the 17 text blocks staying in a cluster of their own (cluster 1). However, about one-third of each of Bede and Orosius blocks also go along with Prose Psalms.


Table 4. Cluster Membership using K-Means Clustering

Fig 3. Cluster Analysis of Boethius, Pastoral Care, Soliloquies, Orosius, and Prose Psalms using 17 Function Words


Figure 3 shows hierarchical clustering where we used text blocks only from Boethius, Pastoral Care, Soliloquies, Orosius and Prose Psalms. Here also, we see that Prose Psalms don’t cluster along with Alfredian texts and rather tend to stay close toOrosius.

Bibliography

Bately, J. (1982) Lexical evidence for the authorship of the prose psalms in the Paris Psalter. Anglo-

Saxon England, 10, 69-95.

Scott, M. (1998)WordSmith Tools Manual, version 3.0, Oxford University Press.

Acknowledgements

This research is being supported by a grant in aid of research at OUC and a grant from the Natural Sciences and Engineering Research Council (NSERC) of Canada.