The effect of non-structural variables on the realisation of intonational plateaux1

Chapter 3The effect of non-structural variables on the realisation of intonational plateaux

3.1Introduction

The results reported in Chapter 2 demonstrate that plateaux are a feature of the intonation contour for five speakers. Furthermore, details of the alignment of the plateau fit well with the large body of literature on peak alignment. Several studies suggest that peak alignment is affected by the lengthening of prosodic structures such as the syllable or the foot. Different causes of lengthening have different effects on peak alignment. For example, studies by Steele (1986) and Silverman and Pierrehumbert (1990), dealing with nuclear and prenuclear accents respectively, suggest that when a structural unit is lengthened by prosodic context, such as an upcoming word boundary or stress clash, or utterance final lengthening, the peak is aligned earlier in the unit under consideration. In Chapter 2 this was also found to be the case for the start and end of the plateau, which are aligned earlier in the syllable when the foot is monosyllabic. Before a word boundary, the female speakers align only the end of the plateau earlier within a lengthened foot.

In addition to these structural, prosodic effects, peak alignment is also affected by lengthening induced by non-structural factors such as intrinsically long vowels and speech rate. For example, Steele (1986) and Silverman and Pierrehumbert (1990) note that peaks move later in a syllable that is lengthened due to a slower tempo. Other work by Ladd and Morton (1997) suggests similarly that peaks are aligned later in the syllable when the syllable is lengthened in an expanded pitch span.

The aim of the present study is to investigate the effect of non-structural variables on the realisation, both duration and alignment, of the plateau. This chapter will focus on the effects of pitch range, tonal environment and sex of the speaker. Unlike the variables in the previous chapter, the three variables discussed here do not affect the segmental or prosodic constituency of the text. Rather they affect the pitch contour itself by changing either the scaling or identity of tones. The following section discusses why these variables might be important and the predictions that can be made about how the plateau’s realisation will alter under their influence.

3.1.1Pitch Range

Pitch range effects have received a great deal of attention in the literature, which is reviewed by Ladd (1996, chpt. 7). As Ladd points out, pitch range is not a single variable. Rather it is somewhat of a catch-all term. There are both between-speaker effects, for example each speaker’s natural range or tessitura, and within-speaker effects, ways in which a speaker may use pitch range for communication or unintentional effects due to physical or emotional states. Ladd also distinguishes between ‘pitch level’ and ‘pitch span’.

Pitch level is defined as the overall level of the voice. In a higher level, both high and low targets in the utterance move upwards within the speaker’s tessitura. This is the type of phenomenon expected when speakers are in highly charged emotional states such as anger or when ‘speaking up’ against noise (Ladd and Terken 1995).

Pitch span is defined as the range of frequencies used at a particular time. When pitch span increases, high targets are higher than in a normal span while low targets remain at the same level or are slightly lower than usual. Thus, in expanded spans, the distance between high and low targets increases. Speakers generally increase their pitch span when they are more interested and emotionally involved with their subject matter.

It has also been suggested (Francis Nolan, personal communication) that it may be useful to have a term such as ‘pitch excursion’ that would refer to the local pitch span. This term would be helpful to describe intonation in situations where words are spoken with more or less emphasis. Although different emphasis would produce a different pitch excursion on the word in question the overall pitch span would change very little.

The present study concentrates on pitch span rather than pitch level. There is some disagreement about the status of pitch span as either a prosodic or paralinguistic variable. In his summary of factors that affect peak alignment Bruce (1990) classes pitch span as a prosodic variable along with factors such as word boundaries and the number of component syllables in a foot or other unit. Nevertheless, it is also possible to think of pitch span as a paralinguistic or extralinguistic variable. Ladd (1996: 36) defines paralinguistic variables as those which are gradient rather that categorical in nature and Crystal and Quirk (1964) consider variations in pitch span to be paralinguistic when they convey something about the speaker’s attitude to what they are saying. These two views may not actually be compatible however as it seems that gradient contrasts may indeed perform a linguistic function. Asu (2002) for example demonstrates that the slope of a pitch contour in Estonian signals different sentence types. There is a trading relation between morphosyntactic and intonational marking so that questions marked with an interrogative particle show a slope similar to statements whilst tag and unmarked questions are spoken with shallower slopes. It is possible that this is a gradient effect of slope, which does indeed perform a linguistic function.

As the perceptual correlate of an increase in pitch span is an increase in emphasis, a gradient relationship, (Ladd and Morton 1997), pitch span is considered to be a paralinguistic variable for the purposes of this study. In any event pitch span does not alter the structure of the text but does alter the realisation of the contour itself and so can be included as a non-structural variable under the definition given in section 3.1.

There are a number of hypotheses that can be made about how plateau realisation will be affected by changes in pitch span. It is expected that both the duration and alignment of the plateau will be affected. It is hypothesised that the absolute duration of plateaux will be shorter in wider pitch spans due to physiological causes. In expanded pitch spans the speaker must reach more extreme values in pitch and it is likely that these more extreme values will take longer to reach (Xu 2002: 92). It is probable therefore that there will be less time available to remain at the high level and realise a plateau in expanded spans.

In terms of alignment two alternative hypotheses are suggested. Firstly, it is possible that the whole plateau will be aligned later in the syllable in more expanded spans. As the plateau is defined as the range of frequencies that fall within 4% of any absolute peak, it is possible that each end of the plateau will be affected in the same way as the peak itself. As discussed above, peaks are aligned later when syllables are lengthened by non-structural factors such as speech rate. In particular Ladd and Morton (1997: 322) state that peaks are later in syllables with a wider pitch span and on this basis it is hypothesised that the plateau’s alignment may be similarly affected.

Alternatively, given the above hypothesis concerning duration, the plateau may contract around the peak in order to allow the speaker more time in which to reach the more extreme frequencies characteristic of wider spans. This may result in the later alignment of the plateau’s beginning, to allow extra time for the rise to the peak, and the earlier alignment of the plateau’s end to allow time for the fall to the low tone.

3.1.2Tonal Environment

Another factor that is known to affect the alignment of peaks is the composition of the tonal string. For example it has been observed that if more than one tone has to be realised on the same tone-bearing unit, tonal repulsion may occur so that the first tone is anticipated and the second is delayed. Tonal repulsion has been observed in many languages (e.g. Grice et al. (2002) for Italian, Prieto (2002) for Catalan, Bruce (1990) for Swedish) although, in general, studies on tonal repulsion in English have focused on the upcoming prosodic, rather than tonal context. For example, Silverman and Pierrehumbert (1990) are concerned with effects on alignment caused by the proximity of two tones with the same identity rather than the effects of tones of different identities.

Nevertheless, work in other languages has focused on alignment differences when tonal identity is varied. D’Imperio (2001) demonstrates that the H of an L*+H pitch accent is aligned earlier when there is an upcoming falling phrase accent. Tonal repulsion or lack thereof has also been used to help determine tonal structure. D’Imperio (2002b) cites the absence of tonal repulsion between a H and a L tone in broad focus declaratives in Neapolitan Italian to be evidence for an H+L* pitch accent rather than an H* pitch accent followed by an L phrase accent. Thus it is not only the identity or number of the tones that is crucial but also their status as pitch or phrase accents.

It is hypothesised that, in the present experiment, the plateau would respond to instances of tonal crowding in the same way as the peak, being temporally reorganised to separate it from other tones, for example those occurring as boundary or phrase tones.

3.1.3Sex of speaker

It is, of course, well known that in general women’s voices are higher than men’s. Women have smaller larynxes with smaller and lighter vocal folds resulting in a higher fundamental frequency (Gussenhoven 2002). It is also the case that women use wider pitch spans than men (McConnell-Ginet 1978). Thus, it is expected that the sex of the speaker will affect the scaling of the plateau. Women’s voices will be higher so their peaks (and therefore their plateaux) and troughs will be higher than men’s and their spans may be wider.

In addition to this purely physical difference in pitch scaling it is also possible that there may be effects caused by sociolinguistic differences in men and women’s speech. There are many examples of variables that are treated differently by men and women although generally, these studies have focused mainly on segmental material. For example Trudgill’s (1983) studies of variation in Norwich show that in all social groups men are more likely than women to use the [] form of the progressive morpheme. This replicates the often-found pattern that women tend to use more standard forms than men whilst men tend to use more non-standard forms than women.

Some studies (beginning with McConnell-Ginet 1978) have, however, focused on the way intonation patterns may vary as a result of sociolinguistic factors. Much of this work has focused on the use of High Rising Terminals (HRTs) in New Zealand English. Britain and Newman (1992) demonstrate that women use more HRTs than men and Ainsworth (1994) found this same difference to be even more pronounced in the speech of New Zealand 4-year-olds even before their absolute pitch range distinguished them.

Warren and Daly (2000) demonstrate that male and female speakers of New Zealand English differ in the size and rate of excursion of their HRTs, women using larger and faster excursions. In addition they demonstrate that there are differences in alignment as women start to rise later than men. One result from the last chapter, the earlier EP alignment before word boundaries found only for women, may well be an example of this type of phenomenon. Therefore, we can expect there to a be a difference in scaling between men’s and women’s plateaux but there may also be other differences, for example in alignment.

A complicating factor in this experiment is that subjects will replicate male and female speakers. It will be interesting to see if the differences caused by sex are maintained in the replications, or not.

What follows is an analysis of recordings made by the author in collaboration with other members of the Phonetics Lab at the University of Cambridge. The data was originally recorded for another purpose (see Nolan 2003) but provides an excellent opportunity to quantitatively investigate the effects of pitch span, tonal composition and sex of speaker on the realisation of the plateau.

3.2Method

3.2.1Stimulus Materials

Two utterances composed of entirely sonorant material were used as stimuli in order to minimise microprosodic effects. These are (with tonetic stress marks and autosegmental notation):

A)We were re  lying on a \ milliner

H% L*+HH*+L 0%

B)A \/ milliner

H*+L H%

Two phoneticians, one male and one female, recorded template utterances. The male speaker would produce a token of utterance A in one of three impressionistically defined pitch spans, hereafter referred to as neutral, compressed (narrower than the neutral span) and expanded (wider than the neutral span), and the female speaker would attempt an exact replication of the token in her own tessitura. The male speaker would then produce a token of utterance B in the same pitch span as utterance A and the female speaker would again attempt a replication. This procedure was repeated an additional two times for each pitch span and the most accurate replication in each pitch span was chosen as the template.

Three tokens of each template utterance were used in the experiment giving 36 tokens (3 tokens x 3 pitch spans x 2 utterances x 2 speakers) in total. These tokens were combined to form 18 conversational dyads. Within each dyad the tokens were randomised with the restrictions that utterance A always preceded utterance B, and that the speaker and the pitch span were different for each token. The order in which the pairs were presented was also randomised. Several measurements were made from each token to ensure that the speakers were in fact using different pitch spans and for use later in interpreting the results proper. Means for each variable are shown below in Table 3.1. It can be seen that peaks get higher and the difference between the peak and L increases as span increases. Troughs are lower in the expanded range. In general durations of the foot and syllable increase in wider spans but measurements for the neutral span are shorter than would be expected from this generalisation.

Measure / Male / Female
C / N / E / C / N / E
Frequency of Peak (Hz) / 104 / 133 / 192 / 222 / 263 / 363
Frequency of L (Hz) / 77 / 77 / 70 / 166 / 168 / 148
Span (Hz) / 27 / 56 / 123 / 56 / 95 / 215
Rate of change (Hz/ms) / 0.234 / 0.422 / 0.790 / 0.384 / 0.596 / 1.04
Syllable duration (ms) / 205 / 230 / 265 / 245 / 215 / 270
Foot duration (ms) / 560 / 520 / 645 / 500 / 515 / 630

Table 3.1 Measurements from template utterances of the male and female phonetician

3.2.2Subjects

Twelve native speakers of British English participated in the experiment. Eight were female and four were male. Their ages ranged from 21 to 29. All were students at the University of Cambridge with some training in phonetics and intonation.

3.2.3Procedure

Stimuli were presented through headphones in a sound-treated booth. Subjects were instructed to repeat each stimulus exactly, aiming to produce an intonationally equivalent utterance in their own voice. After each utterance had been played, an on-screen message prompted subjects to record a replication in their own time. Their speech was recorded directly onto the hard-drive of a Silicon Graphics workstation via a high quality microphone. Recordings that were unsatisfactory were repeated at the time of recording.

3.2.4Method of analysis

All the measurements taken for the experiments in Chapter 2 were also taken from the present data. In this experiment, the foot of interest consists of the word ‘milliner’ and the accented syllable is //. The elbow in the contour representing the low tone following the plateau was also located for comparison with the high reference points. In addition the durations between SP and the peak and the peak and EP were measured. Several measurements of scale were also made. These were the frequencies of the peak and the trough and the difference between them.

Analyses used a repeated measures design (MANOVA). In cases where Mauchly’s test indicated sphericity could be not assumed a Greenhouse-Geisser correction was used. Independent factors were utterance (2), sex of speaker replicated (2) and pitch span (3). When pitch span (as it has three levels) was found to be a significant factor in the MANOVA, planned comparisons were conducted using paired t-tests on the means of results for each level. Analyses for timing and alignment are largely similar in relation to the syllable or foot and therefore alignment data are presented only in relation to the syllable.

3.3Results

3.3.1Pitch height and span

Three tests were conducted to see if subjects were in fact using different pitch spans. These results are shown in Figure 3.1, Figure 3.2, and Figure 3.3. As expected the maximum frequency (the peak) was significantly higher in the more expanded pitch spans (F(1.02, 11.18) = 127.2, p<0.01). Planned comparisons indicate the frequency is lower in the compressed than the neutral range (t(11) = 8.3, p<0.01) and lower in the neutral than the expanded range (t(11) = 12.5 p<0.01). The maximum frequency is higher in utterance type B than in utterance type A (F (1,11) = 23.47, p<0.01). Subjects also produce higher maximum frequencies when replicating those utterances produced by the female speaker (F(1,11) = 15.82, p<0.01). There is a significant interaction of utterance and range (F (1.365, 15.013) = 19.681, p<0.01). This interaction occurs because in the neutral range there is no difference between peak height in utterance A and B.