Spring 2008Mike JonesExample 1

AR Example

“A Method for Improved Modeling of Input Data Using Stochastic-Deterministic Decomposition and Synthesis”

Submitted: April 1, 2008

Prepared for:

Dr. Rick McKenzie

Dr. Michael Bailey

Dr. Ghaith Rabadi

Dr. Zia-ur Rahman

Old Dominion University

Norfolk, VA

Prepared by:

Michael C. Jones

Old Dominion University

Norfolk, VA

1

Spring 2008Mike JonesExample 1

0.0: Introduction:

This paper is submitted as an introduction to the work I am proposing. It presents more questions than answers. The goal is not to provide a finished product, but rather to demonstrate one approach I hope to explore. This example uses an autoregressive (AR) model to determine the periodic components of a data stream, dissects the stream to isolate one of the components, modifies the component, and synthesizes a new data stream. To put this in perspective, consider a restaurant where the flow of customers was observed to fluctuate over both a weekly and daily cycle. This method could be used to model the impact of a reduction in the daily cycle, perhaps due to the closing of a nearby business.

1.0: Overview:

A two-step process be is employed to reproduce the data stream. Since the order of observations is critical to capturing the spectral content of the data stream, the data stream is considered a time series with an index, n, used to annotate the relative positions of the values. The first step, the analysis step, consists of determining and recording the signal parameters. These parameters will be used in the synthesis step to create a data stream which is similar to, but distinct from, the original data stream. The parameters used are the linear prediction filter parameters (Therrien 1992 & Haykin, 1991).

Linear prediction filters use weighted multiples of the previous values in the data stream to estimate the next value. Each previous value, , is multiplied by ,the associated filter weights.

The error of the estimate is defined as the difference between the estimate of the current value and the observed value. The objective of the linear filter is to determine the values of the weights which minimize this error, in the least squares sense. The error is a function of the number of weights used in the filter, P. The error is named the forward prediction error and is defined in equation form as:

The paradigm of analyzing observed data streams implies that the entire data stream is known before the analysis phase begins. Mathematically, there is no reason to limit the analysis to progressing forward in time. This paradigm provides an opportunity use future values to predict the current value in the same manner, resulting in a second estimate of the current value. This estimate is named the backward prediction error and is defined in equation form as:

The optimal prediction filter weights for the forward and backward cases are identical, that is for all values of i. The forward and backward prediction errors, however, are not the same. The filter weights may be determined by minimizing either the forward or backward prediction errors. This determination is an optimization problem in p-dimensional space. Like most optimization methods, this consists of estimating the optimal solution, determining the slope of the error plane at that point, and updating the estimate based on the slope. This process is repeated until a minimum is reached as indicated by a slope of zero. The slope of the error planes for the forward and backward prediction methods are not the same. Therefore, the optimal weights may be determined most efficiently my minimizing the sum of the forward and backward errors.

The data stream may be modeled as the output of a Finite Impulse Response (FIR) filter. The filter weights are related to the spectral makeup of the data stream. If the weights are viewed as the coefficients of a polynomial in the z-domain, the polynomial can be factored. The roots of the polynomial are the poles of the transfer function. A set of roots will correspond to each spectral component. Since the data stream is made up of real values, the roots will either be real numbers or pairs of complex conjugates. Each spectral component will be represented by a complex conjugate pair. The angle of the complex number will be between 0 and pi, corresponding to a frequency between zero and the Nyquist frequency.

In the analysis phase, the filter weights are determined. In the synthesis phase, the roots may be modified to change the spectral content of the data stream. The new roots are then multiplied together to reach a new set of filter weights. The new filter can be excited to produce a new data stream with the new spectral content.

2.0: example:

A data stream was constructed based on three components: a slowly changing component which would be analogous to the weekly fluctuation in customers at the restaurant, a rapidly changing component which would be analogous to the daily fluctuation in customers at the restaurant, and a purely random component. The data stream and each component are depicted in the figure below. The bottom graph is the spectrogram of the composite data stream. Notice the two spikes in the spectrogram, one corresponding to each of the observed fluctuations.

The filter weights for this data stream were determined using a MATLAB function, and plotted on the following graph. Since the spectrogram revealed two distinct components, a fourth order AR model was selected. The four filter weights were used as the coefficients of a fourth order polynomial, which was factored to produce four roots.

The angle of each root was converted to a frequency to show that the roots accurately represent the spectral components observed in the spectrogram. The calculated frequencies are at the top of the graph.

To ensure a reasonable model has been obtained before manipulating the roots, a data stream was created from the existing roots and presented below. By comparison, the synthetic data stream is very similar to the original data stream.

Since the model is satisfactory, the analysis phase is complete. For the synthesis phase, the model will be manipulated to reduce the higher frequency component. To accomplish this, only the roots corresponding to the higher frequency component must be manipulated. A new model was created with the roots corresponding to the high frequency component replaced with new values having the same angle, but smaller magnitude. The new roots were multiplied together to derive new filter weights. The new model was then excited to produce a new synthetic data stream depicted below.

As expected, the high frequency component has been reduced. The new data stream can be used to model the arrival of customers at the restaurant if the lunch hour rush was reduced.

3.0: The WAY AHEAD:

This example used a stationary signal. The next effort will be to implement a block routine which will work on nonstationary signals. I have some ideas on how to do that and estimate a week to get it working. I intend to apply that to solar flare data associated with wolfer numbers, which have been collected for nearly three hundred years.

The AR model is simple and straight forward. I intend to attempt a variety of other methods, including the matching pursuit method used in one of the references. These methods the individual fluctuations as a natural byproduct of the analysis phase. I also intend to explore various methods of manipulating the data stream to introduce variations.

4.0: References:

Haykin, S. (1991). Adaptive Filter Theory. (2 ed.) Prentice Hall.

Therrien, C. W. (1992). Discrete Random Signals and Statistical Signal Processing. Prentice Hall.

1

Version 1.1