Pulse-Finding in Contrapuntal Music

How Two Voices Make a Whole:

Contrapuntal Competition for Attention in Human and Machine Pulse-Finding

1

Pulse-Finding in Contrapuntal Music

Joel Snyder

Department of Psychology

Cornell University

Petri Toiviainen

Department of Music

University of Jyväskylä

1

Pulse-Finding in Contrapuntal Music

While listening to music, people often form a simplified representation of the temporal structure in the music. This simplified representation often takes the form of an approximately isochronous series of pulses with inter-pulse intervals near 600 ms (Clarke, 1999; Fraisse, 1982; Parncutt, 1994; Snyder & Krumhansl, 1999; van Noorden & Moelants, 1999). These pulses provide not only the basis of our perceptual representations of rhythm and meter, but also may underlie our ability to produce rhythmically appropriate actions (Schubotz, Friederici, & Cramon, 2000) such as playing music in an ensemble, dancing in time with music, or simply tapping our feet or fingers to the perceived pulse.

While the output of the human pulse-finding system is relatively simple (i.e., a quasi-isochronous series impulses), the computational ability to reduce the temporal complexity of music to a pattern of pulses is extremely impressive. In this sense, pulse-finding is analogous to an equally-impressive ability to reduce improvised melodies to a simple theme (Large, Palmer, & Pollack, 1995). Whereas in melodic reduction, subjects identify which notes are most essential, in pulse-finding subjects must find the primary durational unit, the period, and its onset position with respect to musical events, the phase. Furthermore, the representation of pulse is dynamic in that the period and phase can change incrementally to compensate for small-scale timing perturbations (Semjen, Vorberg, & Schulze, 1998; Thaut, Miller, & Schauer, 1998), and categorically from one global state to another (e.g., from a quarter-note period to an eighth-note period, or from a down-beat phase to an up-beat phase).

In many polyphonic musical styles, certain instruments are designated to provide rhythmic and harmonic structure supporting the melody. Examples include the rhythm section in jazz, parts of the orchestra during a symphony or concerto, and the left-hand or thumb in many styles of piano and guitar music, respectively. Therefore, in a recent study on pulse-finding in piano ragtime (Snyder & Krumhansl, 1999), it was not surprising that removing the left-hand part led to degraded performance in pulse-finding. However, the situation is potentially different in contrapuntal music in which no one voice constantly provides the rhythmic structure. This democratic quality of contrapuntal music, gives rise to a potential problem in identifying how the perception of pulse arises. By simply examining scores of such music, it is not always apparent what information is available for pulse-finding. It seems likely that pulse-finding in contrapuntal music is generally more difficult than music with a rhythm section because of the attentional demands associated with searching for a voice carrying pulse cues. Contrapuntal music raises other interesting questions for students of pulse-finding, such as the extent to which one voice will dominate attention at any given point in time, what musical features determine the voice to which attention is directed, and whether the perception of pulse can be influenced by multiple voices at once.

In addition to behavioral studies of pulse-finding, many researchers have proposed computational models to account for this ability. Before describing these models, it is useful to keep in mind what properties we desire in a model of pulse-finding. Firstly, a model of pulse-finding must exhibit the ability to find the pulse that humans hear for a given excerpt of music, and in a similar amount time as humans. Secondly, the model must exhibit robust performance in the presence of the normal timing deviations found in human musical performance (Palmer, 1997). Thirdly, the model should show instabilities of period and phase when humans do. And lastly, before making any strong psychological or biological claims about the model, one must show that the model relies on similar mechanisms as people. While this last test of a model is crucial from the standpoint of experimental psychology, we will provide preliminary data establishing the mathematical validity of a model of pulse-finding, using contrapuntal music.

Currently, the predominant class of models used for musical pulse-finding rely on oscillatory units that entrain in a 1:1 fashion to periodic components in musical stimuli (Gasser, Eck, & Port, 1999; Large, 1994; Large & Jones, 1999; Scheirer, 1998; Toiviainen, 1998; for a detailed examination of rule-based models though, see Desain & Honing (1999)). Oscillatory modelers have proposed adaptive oscillation as a mechanism for dynamically tracking the pulse in real musical performances. In other words, the oscillatory units in these models are able to adjust the period and phase to compensate for temporal deviations from isochrony in the music. In addition, they give a simple quasi-periodic response to complex musical patterns, corresponding well to the behavioral output of human pulse-finding. This is attractive because the proposed oscillatory mechanism could not only give rise to the sensation of pulse but also could directly drive rhythmic sensory-motor output such as tapping. Oscillatory models bear important similarities to an influential hypothesis of rhythmic cognition, Dynamic Attending Theory (Jones, 1976; Jones & Boltz, 1989; Large & Jones, 1999). This theory assumes that internal attentional rhythms underlie our ability to track and find structure in time-varying patterns. Such a mechanism may be used to represent time intervals, perceive metrical structure in speech and music, and produce rhythmic motor patterns.

Past work shows that adaptive oscillator models can dynamically track musical and non-musical patterns (Gasser, Eck, & Port, 1999; Large, 1994; Large & Jones, 1999; Scheirer, 1998; Toiviainen, 1998). However, experimentalists have yet to thoroughly compare the models to human performance on musical pulse-finding. Therefore, the first goal of this study is to collect new behavioral data on human pulse-finding and to determine whether an oscillator model of pulse performs similarly to humans on this task. The particular model we test uses oscillating units similar to the one described by Toiviainen (1998; see Toiviainen & Snyder, 2000 in these proceedings for new developments).

For the behavioral experiment and for testing the model, we chose a single contrapuntal organ duet composed by J.S. Bach (1685-1750), BWV 805. It is the final work in a series of four organ duets for a single performer. More specifically, the left-hand of the organist plays the bottom voice, while the right-hand plays the top voice. We refer to the two imitative voices as the right-hand part (RH), the left-hand part (LH), and to the two parts together as both. In the experiment we present the RH, the LH, and both versions of short excerpts to subjects who tap the perceived pulse of the music. We then attempt to model human performance with the adaptive oscillator system described above, and analyze musical features that possibly influence performance.

EXPERIMENT

For this experiment, we selected 8 eight-measure excerpts from a MIDI version of BWV 805. For each excerpt, LH, RH, and both versions were presented to subjects on separate tapping trials. We will focus on determining the relative influence of the two voices on performance, whether starting position of the excerpts influences tapping performance, and whether spontaneous tapping tempo predicts the period of musical tapping.

Method

Subjects

Fourteen musically-experienced Cornell University undergraduates (M =5, F = 9) with a mean age of 19.8 years have participated so far, for extra-credit in psychology courses. Our criterion for inclusion in the experiment was at least eight years experience singing or playing an instrument, at least some of which must have been in the two years prior to testing. The mean years playing music was 12.4 and the mean familiarity with BWV 805 on a scale from 1 to 7 was 1.6. No subject reported any hearing loss or motor disorders.

Materials and Stimuli

Subjects were tested using a software-based MIDI system running on a Power Macintosh 350 MHz G4 machine. We prepared stimuli using the Digital Performer sequencing software. During the experiment, the experimenter operated a MAX interface that stored presentation orders, played the stimuli, and collected tapping responses from a Yamaha KX88 keyboard while subjects listened to stimuli and percussive auditory feedback with AKG K-141 headphones. The MAX interface sent MIDI information internally to Unity DS-1 software, converting the MIDI code to audio information. The computer sent audio information to a Yamaha 1204 mixing console that connected to the headphones.

BWV 805 is in 2/2 meter and the tempo was set to an inter-beat interval of 800 ms or a metronome marking of 75 half-note beats per minute. The piece had a church organ timbre selected from the Unity DS-1 timbre library, and was metronomic such that the inter-beat interval is constant throughout the piece. We chose four pairs of excerpts from the piece such that between paired excerpts, the LH and RH parts play similar patterns. Half of the excerpts begin on the first eighth-note of the measure and the other half begin on the third eighth-note of the measure. All excerpt pairs have the same starting point. Three versions of each excerpt were created, one with both voices (stimuli 1-8), one with only the LH (stimuli 9-16), and one with only the RH (stimuli 17-24). Procedure

Four orders of the twenty-four stimuli were randomly selected without replacement. To counterbalance for order of presentation, we created an additional four orders by taking the reverses of the four selected orders. Each subject tapped to either the four selected orders or their reverses. We constructed four latin-squares, yielding sixteen orders for presenting the four blocks to subjects. The first two latin-squares are for the first four random orders and the second two latin-squares are for the reversed orders. The final two stimuli of each block were also presented in the beginning of the block and were not used in any analyses. Thus, each subject tapped to each stimulus once in each of four blocks, in addition to two additional un-analyzed stimuli at the beginning of each block. Between each stimulus presentation, monotonic piano tones played with randomly-selected inter-onset intervals, meant to function as memory washouts. The time interval between the end of the piano tones and each new stimulus presentation was 1600-3200 ms long.

For each subject, testing occurred in a single one-hour session. First, each subject tapped spontaneously for two 30 sec trials to determine preferred tempo. Second, each subject tapped to two sets of practice stimuli, each set consisting of a LH, RH, and both versions of an excerpt from another organ duet by Bach, BWV 803. These trials were as described for the experimental blocks. Third, subjects tapped to the four blocks of BWV 805 stimuli. For the musical tapping trials, the instructions told each subject to begin tapping with the dominant hand on the keyboard after finding the beat mentally, and to tap consistently through the end of the music. Lastly, each subject filled out a questionnaire pertaining to past music and dance experience.

Results and Discussion

We analyzed tapping behavior for each trial individually using several measures, respectively indexing tapping period and phase, switches in period and phase, and the time it takes to start tapping periodically. We calculated all performance measures off-line, using only the tap onset time. For graphical explanations of the tapping period and phase, see Figure 1. The first step in analyzing these data was to determine whether the period associated with each tap corresponded to a period of 2, 4, or 8 eighth-notes. If a tap period was 100 ms or more from all of these periods, the tap was aperiodic. Next, for each periodic tap, we determined the phase as follows. For a tap period of 2 eighth-notes, if the tap phase was an eighth-note metrical position of 1, 3, 5, or 7, the tap phase was 1. Otherwise, if the period was 2 and the tap position was 2, 4, 6, or 8, the phase was 2. Similarly, if the tap period was 4 eighth-notes, tapping at positions 1 or 5 counted as a phase of 1, tapping at positions 2 or 6 was a phase of 2, tapping at positions 3 or 7 was a phase of 3, and tapping at positions 4 or 8 was a phase of 4. Lastly for tap periods of 8, the tap phase simply corresponded to the eighth-note metrical position of the tap. Together, the tap period and phase defined the mode of tapping, as shown in Figure 1. For example, mode 2_1 implies tapping with a period of 2 eighth-notes and with a phase of 1. For mode of tapping, results will be expressed in proportions of total taps for each mode.

Period switches occurred when the current tap period did not match the previous tap period, but did match the following tap period. Period switches included switches to or from aperiodic tapping. A phase switch occurred when the previous tap period matched the following tap period, but the current tap phase did not match the previous tap phase. This means that a phase switch could only occur when the period did not change. Finally, beats to start tapping (BST) was the time in ms from the first note to the first aperiodic tap, divided by the eighth-note duration, 200 ms.

Mode (Period_Phase) 1/8-Note Metrical Positions (Bold Indicates Tap)
2_1 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
2_2 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
4_1 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
4_2 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
4_3 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
4_4 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
8_1 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
8_2 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
8_3 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
8_4 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
8_5 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
8_6 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
8_7 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
8_8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

Figure 1. Three schematic measures depicted at the eighth-note level with 2/2 meter. Taps are indicated by bold, underlined metrical positions. Fourteen possible periodic modes of tapping are shown, each characterized by a period (2, 4 or 8 eighth-notes) and a phase. Aperiodic tapping is tapping with a period other than 2, 4, or 8 eighth-notes.