Signal Processing Techniques for Detecting Quasi-Repetition

In signal processing, the term periodicity estimation is used to mean automatic techniques for detecting quasi-repetition. Two are well known in the computer music literature, originally applied in the regime of pitch perception to do fundamental frequency estimation, but also applied more recently for analysis of rhythmic and metric structures: autocorrelation and harmonic spectral product.

Autocorrelation

Autocorrelation finds the correlation[1] of a signal against different versions of itself time-shifted by various amounts. Each time-shift amount is called a lag time. The output of an autocorrelation is the correlation amount[2] as a function of lag time. The maximum value will always be at a lag of zero, since a signal is always perfectly correlated with an exact copy of itself.[3] Other peaks in the autocorrelation indicate lag times at which the signal is relatively highly correlated with itself; these can be interpreted as periods at which the signal quasi-repeats. In other words, autocorrelation is based on the idea that a quasi-periodic signal will resemble itself in the time domain when time-shifted by a duration (nearly) equal to the period.

Figure 1: Autocorrelation of three short signals

Autocorrelation of three short signals: noise (top), sine wave (middle), andaprimitive idealized “metric” signal (bottom)

Figure 1 shows three signals alongside their autocorrelation functions. For the noise signal, other than the peak at lag zero, there doesn’t seem to be any structure to the autocorrelation. The sine signal changes gradually, so the autocorrelation also changes gradually; this helps explain why autocorrelation is good at finding not-quite-exact repetition when the signal is somewhat smooth. The third example is a “metric” signal of a loud impulse alternating with a quieter impulse, separated by three units of silence. In this case, the lag of 4 yields a peak, since it makes the impulses all line up with other, while any lag amount that is not a multiple of 4 yields a zero because it makes the impulses line up with silences. The lag of 8 has an even higher correlation than the lag of 4, because in addition to making all the impulses line up with each other it also makes the loud impulses line up with each other.

Here is a far-from-comprehensive set of references on use of autocorrelation for rhythmic analysis: {Alonso, 2004 #145; Scheirer, 1997 #154; Frieler, 2004 #164; Brown, 1993 #165; Tzanetakis, 2001 #166; Brossier, 2006 #167; Davies, 2005 #168; Paulus, 2002 #169; Toiviainen, 2005 #170; Peeters, 2005 #171}.

A recent development in the use of autocorrelation is the Autocorrelation Phase Matrix {Eck, 2007 #180; Eck, 2005 #1}, which outputs a two-dimensional (2D) matrix showing the correlation amount as a function of both lag time and phase. The distribution of autocorrelation energy in this space can reveal rhythmic structure even in cases where the autocorrelation alone provides no insight.

Related to autocorrelation is a method based on comb filtering {Scheirer, 1998 #96}, in which an input signal passes through a collection of recirculating feedback delay lines. For example, the output of the one-second delay line is equal to the input plus a quieter version of the input from exactly one second ago, plus an even quieter version of the input from exactly two seconds ago, etc. So if the input contains periodicity at or near the one Hertz frequency, the amount of energy in the one-second delay line will tend to increase.

(Harmonic) Spectral Product

The magnitude spectrum of a quasi-repeating signal should have a peak corresponding to the frequency of repetition. The harmonic spectral product method (sometimes called just “spectral product”) is based on the assumption that the spectrum of a quasi-repeating signal will also have relatively strong peaks at frequencies corresponding to the first few harmonics of the frequency of repetition. This method works by first finding the magnitude spectrum (for example, with an FFT), then successively compressing that spectrum by factors of 2, 3, etc., up to M, then multiplying together all M spectra.[4]

Figure 2: Harmonic spectral product of three short signals

Noise (top), sawtoothwave (middle), and a primitive idealized “metric” signal (bottom)

Figure 2 shows three short signals, their magnitude spectra, and their harmonic spectral products with M=3. For the noise signal the magnitude spectrum is basically flat and any structure to the spectral product is random.[5] The sawtooth wave has a harmonic spectrum exactly like what this method expects to see, and indeed the spectral product has a huge peak at the sawtooth’s fundamental frequency. The “metric” signal is perfectly periodic with a harmonic spectrum and so again the spectral product technique easily finds the fundamental frequency.

Harmonic spectral product has been used by {Alonso, 2004 #145}

[1] In this sense correlation between two signals x and y simply means the sum of the pointwise product: xiyi

[2] I’ve scaled the values so that the maximum correlation, i.e., the correlation between two copies of the same signal, is one (by using the ‘coeff’ argument to Matlab’s xcorr function. A correlation of zero means that the two signals have nothing in common.

[3] For some signals the (unbiased) correlation amount at other lags might be equal to that at lag zero. For example, a completely constant signal will have equal correlation amounts at all lag times. A perfectly repeating signal’s autocorrelation at a lag equal to the period will be the same as that at lag zero. An unbiased correlation ( corrects for the fact that higher lag times correspond to shorter durations of the overlap between the original and time-shifted versions of the signal; the graphs in Figure 2 show regular biased autocorrelation.

[4] Alonso {Alonso, 2004 #145} writes this formula for spectral product, where f is normalized frequency and P(ej2πf) is one bin of the FFT of the input signal:

[5] The apparent structure in this example is due to the short duration (only 48 samples) of noise. As the number of noise samples increases the magnitude spectrum and therefore spectral product become flat.