1Basis
1.1Distribution function
1.2Statistical stationarity
1.3White noise
1.4Autocovariance and autocorrelation
1.5PACF
1.6Polynomial lag-processes
1.7Random walk
1.8Forecasting function
1.9Periodogram
2AR processen
2.1The AR(1) process
2.2The AR(2) process
2.3The AR(p) process
3MA processen
3.1The MA(1) process
3.2The MA(2) process
3.3 The MA(q) process
4ARMA
4.1The ARMA(1,1) process
4.2The ARMA(p,q) process
5Wold's decomposition theorem
6Non stationary time series
7Differencing (Nabla and B operator)
8The behavior of non stationary time series
9 Inverse autocorrelations
10Unit root tests
Symbolen en vergelijkingen
1Basis
First we define some important concepts. A stochastic process (c.q. probabilistic process) is defined by a T-dimensional distribution function.
1.1Distribution function
Before analyzing the structure of a time series model one must make sure that the time series are stationary with respect to the variance and with respect to the mean. First, we will assume statistical stationarity of all time series (later on, this restriction will be relaxed).
1.2Statistical stationarity
Statistical stationarity of a time series implies that the marginal probability distribution is time-independent which means that:
the expected values and variances are constant
where T is the number of observations in the time series;
the autocovariances (and autocorrelations) must be constant
where k is an integer time-lag;
the variable has a joint normal distribution f(X1, X2, ..., XT) with marginal normal distribution in each dimension
REM! If only this last condition is not met, we denote this by weak stationarity.
1.3White noise
Now it is possible to define white noise as a stochastic process (which is statistically stationary) defined by a marginal distribution function (V.I.1-1), where all Xt are independent variables (with zero covariances), with a joint normal distribution f(X1, X2, ..., XT), and with
It is obvious from this definition that for any white noise process the probability function can be written as
1.4Autocovariance and autocorrelation
Define the autocovariance as
or
whereas the autocorrelation is defined as
In practice however, we only have the sample observations at our disposal. Therefore we use the sample autocorrelations
for any integer k.
Remark that the autocovariance matrix and autocorrelation matrix associated with a stochastic stationary process
is always positive definite, which can be easily shown since a linear combination of the stochastic variable
has a variance of
which is always positive.
This implies for instance for T=3 that
or
Bartlett proved that the variance of autocorrelation of a stationary normal stochastic process can be formulated as
This expression can be shown to be reduced to
if the autocorrelation coefficients decrease exponentially like
Since the autocorrelations for i > q (a natural number) are equal to zero, expression (V.I.1-17) can be shown to be reformulated as
which is the so called large-lag variance. Now it is possible to vary q from 1 to any desired integer number of autocorrelations, replace the theoretical correlations by their sample estimates, and compute the square root of (V.I.1-20) to find the standard deviation of the sample autocorrelation.
Note that the standard deviation of one autocorrelation coefficient is almost always approximated by
The covariances between autocorrelation coefficients have also been deduced by Bartlett
which is a good indicator for dependencies between autocorrelations. Remind therefore that inter-correlated autocorrelations can seriously distort the picture of the autocorrelation function (ACF c.q. autocorrelations as a function of a time-lag).
1.5PACF
It is however possible to remove the intervening correlations between Xt and Xt-k by defining a partial autocorrelation function (PACF)
The partial autocorrelation coefficients are defined as the last coefficient of a partial autoregression equation of order k
It is obvious that there exists a relationship between the PACF and the ACF can be rewritten as
or (on taking expectations and dividing by the variance)
Sometimes is written in matrix formulation according to the Yule-Walker relations
or simply
(V.I.1-27)
Solving (V.I.1-27) according to Cramer's Rule yields
Note that the determinant of the numerator contains the same elements as the determinant of the denominator, except for the last column that has been replaced.
A practical numerical estimation algorithm for the PACF is given by Durbin
with
The standard error of a partial autocorrelation coefficient for k > p (where p is the order of the autoregressive data generating process; see later) is given by
1.6Polynomial lag-processes
Finally, we define the following polynomial lag-processes
where B is the backshift operator (c.q. BiYt = Yt-i) and where
These polynomial expressions are used to define linear filters. By definition a linear filter
generates a stochastic process
where at is a white noise variable.
1.7Random walk
(V.I.1-36)
for which the following is obvious
We call eq. (V.I.1-36) the random-walk model: a model that describes time series that are fluctuating around X0 in the short and in the long run (since at is white noise).
It is interesting to note that a random-walk is normally distributed. This can be proved by using the definition of white noise and computing the moment generating function of the random-walk
from which we deduce
(Q.E.D.).
A deterministic trend is generated by a random-walk model with an added constant
The trend can be illustrated by re-expressing
where ct is a linear deterministic trend (as a function of time).
The linear filter(V.I.1-35) is normally distributed with
due to the additivity property of eq. (I.III-33), (I.III-34), and (I.III-35) applied to at.
Now the autocorrelation of a linear filter can be quite easily computed as
since
and
Now it is quite evident that, if the linear filter (V.I.1-35) generates the variable Xt, then Xt is a stationary stochastic process((V.I.1-1) - (V.I.1-3)) defined by a normal distribution (V.I.1-4) (and therefore strongly stationary), and a autocovariance function (V.I.1-45) which is only dependent on the time-lag k.
1.8Forecasting function
The set of equations resulting from a linear filter (V.I.1-35) with ACF (V.I.1-44) are sometimes called stochastic difference equations. These stochastic difference equations can be used in practice to forecast (economic) time series. The forecasting function is given by
the densityof the forecasting function is
where
is known, and therefore equal to a constant term. Therefore it is obvious that
The concepts defined and described above are all time-related. This implies for instance that autocorrelations are defined as a function of time. Historically, this time-domain viewpoint is preceded by the frequency-domain viewpoint where it is assumed that time series consist of sine and cosine waves at different frequencies.
In practice there are both advantages and disadvantages to both viewpoints. Nevertheless, both should be seen as complementary to each other.
1.9Periodogram
for the Fourier series model
In last equation we define
The least squares estimates of the parameters in (V.I.1-52) are computed by
In case of a time series with an even number of observations T = 2 q the same definitions are applicable except for
It can furthermore be shown that
such that
Obviously
It is also possible to show that
If
(V.I.1-63)
then
and
and
and
and
which state the orthogonality properties of sinusoids and which can be proved. Remark that (V.I.1-67) is a special case of (V.I.1-64) and (V.I.1-68) is a special case of (V.I.1-66). Particularly eq. (V.I.1-66) is interesting for our discussion in regard to (V.I.1-60) and (V.I.1-53), since it states that sinusoids are independent.
If (V.I.1-52) is redefined as
then I(f) is called the sample spectrum.
The sample spectrum is in fact a Fourier cosine transformation of the autocovariance function estimate. Denote the covariance-estimate of (V.I.1-7)by the sample-covariance (c.q. the numerator of (V.I.1-10)), the complex number i, and the frequency by f, then
(V.I.1-70)
On using (V.I.1-55)and (V.I.1-70) it follows that
which can be substituted into (V.I.1-70) yielding
Now from (V.I.1-10) it follows
and if (t - t') is substituted by k then (V.I.1-72) becomes
which proves the link between the sample spectrum and the estimated autocovariance function.
On taking expectations of the spectrum we obtain
for which it can be shown that
(V.I.1-76)
On combining (V.I.1-75) and (V.I1.1-76) and on defining the power spectrum as p(f) we find
It is quite obvious that
so that it follows that the power spectrum converges if the covariance decreases rather quickly. The power spectrum is a Fourier cosine transformation of the (population) autocovariance function. This implies that for any theoretical autocovariance function (cfr. the following sections) a respective theoretical power spectrum can be formulated.
Of course the power spectrum can be reformulated with respect to autocorrelations in stead of autocovariances
which is the so-called spectral density function.
Since
it follows that
and since g(f) > 0 the properties of g(f) are quite similar to those of a frequency distribution function.
Since it can be shown that the sample spectrum fluctuates wildly around the theoretical power spectrum a modified (c.q. smoothed) estimate of the power spectrum is suggested as
2AR processen
2.1The AR(1) process
The AR(1) process is defined as
(V.I.1-83)
where Wt is a stationary time series, et is a white noise error term, and Ft is called the forecasting function. Now we derive the theoretical pattern of the ACF of an AR(1) process for identification purposes.
First, we note that (V.I.1-83) may be alternatively written in the form
(V.I.1-84)
Second, we multiply the AR(1) process in (V.I.1-83) by Wt-k in expectations form
(V.I.1-85)
Since we know that for k = 0 the RHS of eq. (V.I.1-85) may be rewritten as
(V.I.1-86)
and that for k > 0 the RHS of eq. (V.I.1-85) is
(V.I.1-87)
we may write the LHS of (V.I.1-85) as
(V.I.1-88)
From (V.I.1-88) we deduce
(V.I.1-89)
and
2.2The AR(2) process
The AR(2) process is defined as
(V.I.1-94)
where Wt is a stationary time series, et is a white noise error term, and Ft is the forecasting function.
The process defined in (V.I.1-94) can be written in the form
(V.I.1-95)
and therefore
(V.I.1-96)
Now, for (V.I.1-96) to be valid, it easily follows that
(V.I.1-97)
and that
(V.I.1-98)
and that
(V.I.1-99)
and finally that
(V.I.1-100)
The model is stationary if the i weights converge. This is the case when some conditions on 1 and 2 are imposed. These conditions can be found on using the solutions of the polynomial of the AR(2) model. The so-called characteristic equation is used to find these solutions
(V.I.1-101)
The solutions of 1 and 2 are
(V.I.1-102)
which can be either real or complex. Notice that the roots are complex if
When these solutions, in absolute value, are smaller than 1, the AR(2) model is stationary.
Later, it will be shown that these conditions are satisfied if 1 and 2 lie in a (Stralkowski) triangular region restricted by
(V.I.1-103)
The derivation of the theoretical ACF and PACF for an AR(2) model is described below.
On multiplying the AR(2) model by Wt-k, and taking expectations we obtain
(V.I.1-104)
From (V.I.1-97) and (V.I.1-98) it follows that
(V.I.1-105)
Now it is possible to combine (V.I.1-104) with (V.I.1-105) such that
(V.I.1-106)
from which it follows that
(V.I.1-107)
Therefore
(V.I.1-108)
Eq. (V.I.1-106) can be rewritten as
(V.I.1-109)
such that on using (V.I.1-108) it is obvious that
(V.I.1-110)
According to (V.I.1-107) the ACF is a second order stochastic difference equation of the form
(V.I.1-111)
where (due to (V.I.1-108))
(V.I.1-112)
are starting values of the difference equation.
In general, the solution to the difference equation is, according to Box and Jenkins (1976), given by
(V.I.1-113)
In particular, three different cases can be worked out for the solutions of the difference equation
(V.I.1-114)
of (V.I.1-102). The general solution of eq. (V.I.1-113) can be written in the form
(V.I.1-115)
(V.I.1-116)
Remark that for the case the following stationarity conditions
(V.I.1-117)
(V.I.1-118)
has two solutions
due to (V.I.1-114) and
due to
(V.I.1-119)
Hence we find the general solution to the difference equation
(V.I.1-120)
In order to impose convergence the following must hold
(V.I.1-121)
Hence two conditions have to be satisfied
(V.I.1-122)
which describes a part of a parabola consisting of acceptable parameter values for
Remark that this parabola is the frontier between acceptable real-valued and acceptable complex roots (cfr. Triangle of Stralkowski).
(V.I.1-123)
in goniometric notation.
The general solution for the second-order difference equation can be found by
(V.I.1-124)
On defining
(V.I.1-125)
the ACF can be shown to be real-valued since
(V.I.1-126)
On using the property
(V.I.1-127)
eq. (V.I.1-126) becomes
(V.I.1-128)
with
(V.I.1-129)
In eq. (V.I.1-128) it is shown that the ACF is oscillating with period = 2/and a variable amplitude of
(V.I.1-130)
as a function of k.
A useful equation can be found to compute the period of the pseudo-periodic behavior of the time series as
(V.I.1-131)
which must satisfy the convergence condition (c.q. the amplitude is exponentially decreasing)
(V.I.1-132)
The pattern of the theoretical PACF can be deduced from relations (V.I.1-25) - (V.I.1-28).
2.3The AR(p) process
An AR(p) process is defined by
(V.I.1-133)
where Wt is a stationary time series, et is a white noise error component, and Ft is the forecasting function.
As described above, the AR(p) process can be written
(V.I.1-134)
Hence
(V.I.1-135)
Theweights converge if the stationarity conditions of the roots of the characteristic equation
(V.I.1-136)
The variance can be shown to be
(V.I.1-137)
(V.I.1-138)
which can be used to study the behavior of the theoretical ACF pattern.
Remember, that the Yule-Walker relations (V.I.1-26) and (V.I.1-27) hold for all AR(p) models. These can be used (together with the application of Cramer's Rule (V.I.1-28)) to derive the theoretical PACF pattern from the theoretical ACF function.
3MA processen
3.1The MA(1) process
The definition of the MA(1) process is given by
(V.I.1-139)
where Wt is a stationary time series, et is a white noise error component, and Ft is the forecasting function
eq. (V.I.1-46) and (V.I.1-45) we obtain
(V.I.1-140)
Therefore the pattern of the theoretical ACF is
(V.I.1-141)
Note that from eq. (5.I.1-141) it follows that
(V.I.1-142)
This implies that there exist at least two MA(1) processes which generate the same theoretical ACF.
Since an MA process consists of a finite number of y weights it follows that the process is always stationary. However, it is necessary to impose the so-called invertibility restrictions such that the MA(q) process can be rewritten into a AR() model.
(V.I.1-143)
converges.
On using the Yule-Walker equations and eq. (V.I.1-141) it can be shown that the theoretical PACF is
(V.I.1-144)
Hence the theoretical PACF is dominated by an exponential function which decreases.
3.2The MA(2) process
By definition the MA(2) process is
(V.I.1-145)
which can be rewritten on using (V.I.1-139)
(V.I.1-146)
where Wt is a stationary time series, et is a white noise error component, and Ft is the forecasting function.
into eq. (V.I.1-46) and (V.I.1-45) we obtain
(V.I.1-147)
Hence the theoretical ACF can be deduced
(V.I.1-148)
The invertibility conditions can be shown to be
(V.I.1-149)
(compare with the stationarity conditions of the AR(2) process).
The deduction of the theoretical PACF is rather complicated but can be shown to be dominated by the sum of two exponentials (in case of real roots), or by decreasing sine waves (in case the roots are complex).
3.3 The MA(q) process
The MA(q) process is defined by
where Wt is a stationary time series, et is a white noise error component, and Ft is the forecasting function.
Remark that this description of the MA(q) process is not straightforward-to-use for forecasting purposes due to it’s recursive character.
into (V.I.1-46) and (V.I.1-45), the following autocovariances can be deduced
(V.I.1-151)
Hence the theoretical ACF is
The theoretical PACF for higher order MA(q) processes are extremely complicated and not extensively discussed in literature.
4ARMA
4.1The ARMA(1,1) process
On combining an AR(1) and a MA(1) process one obtains an ARMA(1,1) model which is defined as
(V.I.1-154)
where Wt is a stationary time series, et is a white noise error component, and Ft is the forecasting function.
Note that the model of (V.I.1-154) may alternatively be written as
(V.I.1-155)
such that
(V.I.1-156)
in -weight notation.
The-weights can be related to the ARMA parameters on using
(V.I.1-157)
such that the following is obtained
(V.I.1-158)
Also the -weights can be related to the ARMA parameters on using
(V.I.1-159)
such that the following is obtained
(V.I.1-160)
From (V.I.1-158) and (V.I.1-160) it can be clearly seen that an ARMA(1,1) is in fact a parsimonious description of either an AR or a MA process with an infinite amount of weights. This does not imply that all higher order AR(p) or MA(q) processes may be written as an ARMA(1,1). Though, in practice an ARMA process (c.q. a mixed model) is, quite frequently, capable of capturing higher order pure-AR -weights or pure-MA -weights.
On writing the ARMA(1,1) process as
(V.I.1-161)
(which is a difference equation) we may multiply by Wt-k and take expectations. This gives
(V.I.1-162)
In case k > 1 the RHS of (V.I.1-162) is zero thus
(V.I.1-163)
If k = 0 or if k = 1 then
(V.I.1-164)
Hence we obtain
(V.I.1-165)
The theoretical ACF is therefore
(V.I.1-166)
4.2The ARMA(p,q) process
The general ARMA(p,q) can be defined by
(V.I.1-167)
or alternatively in MA( notation
(V.I.1-168)
or in AR() notation
(V.I.1-169)
where(B) = 1/(B).
The stationarity conditions depend on the AR part: the roots of (B) = 0 must be larger than 1. The invertibility conditions only depend on the MA part: the roots of (B) = 0 must also be larger than 1.
The theoretical ACF and PACF patterns are deduced from the so-called difference equations
(V.I.1-170)
and Cramer's rule applied to the Yule-Walker equations.
5Wold's decomposition theorem
The most fundamental justification for time series analysis (as described in this text) is due to Wold's decomposition theorem, where it is explicitly proved that any (stationary) time series can be decomposed into two different parts. The first (deterministic) part can be exactly described by a linear combination of its own past, the second part is a MA component of a finite order.
A slightly adapted version of Wold's decomposition theorem states that any real-valued stationary process Yt can be written as
(V.I.1-171)
where yt and zt are not correlated.
(V.I.1-172)
where yt is deterministic
(V.I.1-173)
and
(V.I.1-174)
(zt has an uncorrelated error with zero mean).
Because of its importance for time series analysis in general, and in practice, we will discuss the proof of Wold's decomp. theorem shortly.
Since there seems to be much confusion in literature about this theorem, we will only discuss the proof of another version (than that described above) which can be proved quite easily. In order to make a distinction with the previous description, we will explicitly use other symbols.
Denote a stationary time series Wt with zero mean and finite variance.
In order to forecast the time series by means of a linear combination of its own past
(V.I.1-175)
a criterion is used to optimize the parameter values. This criterion is
(V.I.1-176)
which is called the sum of squared residuals (SSR).
The normal equations (c.q. eq. (V.I.1-176) differentiated w.r.t. the parameters) is easily found to be
(V.I.1-177)
In matrix notation eq. (V.I.1-177) becomes
(V.I.1-178)
with
(V.I.1-179)
which is symmetric about both diagonals due to the stationary of Wt and with
(V.I.1-180)
On adding an error component et,n to eq. (V.I.1-175) it can be shown that
(V.I.1-181)
The first part of (V.I.1-181) is almost trivial
(V.I.1-182)
The second part of (V.I.1-181) is
(V.I.1-183)
with a RHS equal to zero since the parameters satisfy (V.I.1-177) (Q.E.D.).
On repeating the previous procedure we obtain
(V.I.1-184)
Remark that the error components (V.I.1-181) and (V.I.1-184) are uncorrelated due to (V.I.1-181), from which we obviously find
(V.I.1-185)
On substituting (V.I.1-184) into (V.I.1-175) it is obvious that
(V.I.1-186)
where xt,n(1) depends on the past of Wt-1
Also it is obvious from (V.I.1-183) that
(V.I.1-187)
and from (II.II.1-27) and the fact that et,n is independent from the other regressors of (V.I.1-186), that
(V.I.1-188)
On repeating the step described above it is easy to obtain
(V.I.1-189)