Harmonic Analysis
Even though spectral analysis, which is based upon the discrete Fourier transform, is a readily used technique for analyzing the spectral content of a signal, one might only be interested in the spectral content for particular frequencies that are not included in the discrete set of frequencies provided by the Fourier transform. This can happen when the number of samples is large and the number of frequencies of interest is much smaller. In this section we address the approximation of our signal, or time series, to a sum over a small set of frequency components. The frequencies of interest are known but the associated amplitude is not known. This will be accomplished using the method of least squares.
We begin by reviewing the method of least squares for determining the best fit of a line. The equation of a line only has two unknown parameters, the slope and the intercept. An understanding of the derivation for this simpler approximation should make that for the harmonic analysis more transparent.
We begin with a set of data, presented in the usual pairs for We are interested in finding the best approximation, in some sense, of this data by some function. In the case of linear regression, we seek a linear relationship of the form Though this line is not expected to agree with the data, it is expected to be as “close as possible”.
What does one mean by “as close as possible”? We could mean that the total distance between the known data points and the line is as small as possible. Though there are many ways we could quantify this, the most natural would be to sum over the standard distance between the points for all data points. Thus, we would look at an expression like
However, since the first term under the square root vanishes and the square root only returns a positive number, we could instead just consider the expression Making the later as small as possible only gives the same result as for the first expression, but is easier to compute. Thus, this leads to the phrase “least square” regression.
We are interested in minimizing this quantity, which could be thought of as a variance about the straight line mean. We minimize this “error” by varying a and b. Thus, we have a two variable minimization problem. In order to minimize a function of one variable, we differentiate the function with respect to the variable and set it equal to zero to determine that value of the independent variable that yields a minimum. In this case, we need to simultaneously set the derivatives with respect to a and b to zero and find the values of a and b that solve the resulting equations.
Differentiating with respect to a and b separately, gives
Regrouping, we find that these are simultaneous equations for the unknowns:
Solving this system of equations gives expressions for a and b in terms of various sums over expressions the involving the data. This is the basis of the so called “best fit line” that is used in many programs, such as those in calculators and in MS Excel.
However, we are interested in fitting data to a more complicated expression than that of a straight line. In particular, we want to fit our data to a sum over sines and cosines of arguments involving particular frequencies.
We will consider a set of data consisting of N values at equally spaced times, We are interested in finding the best approximation to a function consisting of M particular frequencies, Namely, we wish to match the data to the function
.
The unknown parameters in this case are the ’s and ’s. We will not determine a function that exactly fits the data, as in the DFT case, but only seek the best fit curve to the data. [Note: for simplicity of the results, we have redefined the constant term
In the following we will pick times . Then In the text, the authors let making For now we will hold off from this notation.
To determine the unknowns, we need to minimize the “variance”
.
To do so, we differentiate this expression with respect to all of the parameters. Namely, for a particular q, we have
As with the previous example of a best fit line, we have here a system of linear equations for the unknowns. We can rewrite this system in a more apparent form and then make use of matrix theory for solving systems of equations.
and
for .
Finally, we need to consider the equations for In this case we have
which reduces to
.
We now note that we can write this system of 2M+1 equations in matrix form. We first define the matrices C and S with elements
.
The above sums over n can be written as the matrix products with the entry in the qth row and kth column as
.
Here is the transpose of C. Thus, and
Inserting these expressions into the system of equations, we have
and
Finally, these equations can be combined by defining ,
and .
Note that Y and Z are 2M+1 dimensional column vectors, c and s are M dimensional column vectors and D is a matrix.
The system of equations in matrix form looks like Solving for the unknowns in the column vector Z, we have
Implementation in Matlab
%
% Harmonic Analysis
%
% Enter Data in y
y=[7.6 7.4 8.2 9.2 10.2 11.5 12.4 13.4 13.7 11.8 10.1 ...
9.0 8.9 9.5 10.6 11.4 12.9 12.7 13.9 14.2 13.5 11.4 10.9 8.1];
N=length(y);
% Number of Harmonics Desired and frequency dt
M=2;
f=1/12*(1:M);
T=24;
alpha=f*T;
% Compute the matrices of trigonometric functions
n=1:N;
C=cos(2*pi*alpha'*n/N);
S=sin(2*pi*alpha'*n/N);
c_row=ones(1,N)*C';
s_row=ones(1,N)*S';
D(1,1)=N;
D(1,2:M+1)=c_row;
D(1,M+2:2*M+1)=s_row;
D(2:M+1,1)=c_row';
D(M+2:2*M+1,1)=s_row';
D(2:M+1,2:M+1)=C*C';
D(M+2:2*M+1,2:M+1)=S*C';
D(2:M+1,M+2:2*M+1)=C*S';
D(M+2:2*M+1,M+2:2*M+1)=S*S';
yy(1,1)=sum(y);
yy(2:M+1)=y*C';
yy(M+2:2*M+1)=y*S';
z=D^(-1)*yy';