Introduction:

Science, Sociology, engineering, economics and many other fields encounter a multitude of time series data; numerical values recorded at intervals of time.

Time series data are often analyzed in conjunction with regression techniques.

This paper will try to summarize some basic time series models and introduce how we can use regression analysis to explain time series data by providing examples on real life data sets using the R software.

Times series theory introduction:

Time series analysis is a study that accounts for the fact that data collected over time may have a pattern (e.g. seasonality, trend or autocorrelation).

First let us introduce some basic definitions and theorems.

Theorem 1: a time series X(t) is said to be stationary if:

i)E(X(t))=µ for X for any t.

ii)Cov(X(t),X(t+h))=c(h) for any t, where c(h) is a function of h independent of t.

Definition 1:i) the autocovariance function (ACVF) of X(t) at lag h is c(h)=cov(X(t+h),X(t))

ii)the autocorrelation function (ACF) of {X(t)} at lag(h) is ρ(h)=c(h)/c(0)

We can model a stationary time series by an ARMA model where

AR: auto regressive.

MA: moving Averages.

An ARMA(p,q) model is usually of the form.

X(t)-φ1*X(t-1)-…φp*X(t-p)=Є(t)+θ1*Є(t-1)+…θq*Є(t-q).

Where Є(t) is a White Noise series denoted by WN.

e.g.: an AR(1)model: is of the form : X(t)-φ*X(t-1)=Є(t). (note that this model is also known as the Markov process).

There are two main objectives of time series analysis. The first one is identifying the phenomenon from observed sequence of data collected over time and the second objective is for forecasting future events based on past and present data. Therefore the need for good prediction techniques arises.

Forecasting usually requires a good model that fits past observed data (which has to be equally spaced in time). The simplest case is when we have a stationary series y(t) with E(y(t))=0., also we assume that Є(t)~WN(0,σ2) (note this is not the case in general practice).

By using the projection theorem (which uses the Hilbert space to base the prediction

of Yupon a linear function of {Y(t), Y(t-1)…,Y(t-p)}, we try to minimize the mean square error linear prediction by forming yhat(t+h)=φ1y(t)+φ2y(t-1)+…+φ(p+1)y(t-p).

Which will minimize E{(y(t+h)-yhat(t+h))2}={(y(t+h)-Σj=1,p+1 φj*y(t-j+1))2}

it can be shown that c(n)=Σ φ(n) *c(n-i). (by using the inner product and the projection theorem).

The matrix notation is:

c(0) c(1) …c(n-1) φ1 c(0)

c(1) c(0)… c(n-2) * = c(1)

. .

. .

. .

c(n-1) …c(0) φn c(n)

Now we are ready to estimate φ(i) by solving the equation. However this requires we know c(i).

Example:

Analyzing an ARMA model in R:

library(stats);

data(lh);

ts.plot(lh); # gives us a time series plot.

# note that the series acts can be estimated as stationary since we can’t see any trends or seasonal effects.

acf(lh);

# note that the lags seem to be independent of the time t.

acf(lh, type="partial"); #computes the partial auto covariance function

Note:R uses the AIC criterion, based on the m.l.e of the Gaussian likelihood for an ARMA process, to choose the order of an AR model.

The gaussian likelihood function is

Where T is the number of observations and SSR is the sum of squared residuals.

ar(lh); #fit an AR model

Call:

ar(x = lh)

Coefficients:

1 2 3

0.6534 -0.0636 -0.2269

Order selected 3 sigma^2 estimated as 0.1959.

# note that AR also gives prediction for the coefficients.

R will also tabulate the residual from the suggested model.

req<-ar(lh)$resid;

Real life data is not always conveniently stationary with normally distributed white noise. By plotting the acf function one may notice a trend in the data, or the covariance between the autoregressive variables may depend on t. In 1976, Box and Jenkins developed the ARIMA model which requires a stationary series but relies on differencing until the series is stationary(may require log transformations to stabilize the variance).

The ARIMA (p,d,q) model ( where p is the autoregressive order , d is differencing order and q is the moving averages order) is modeled as:

Φ(B)(1-B)dXt=θ(Zt) where Zt ~WN(0,σ2). Although this method is very powerful and flexible it has a lot of limitations because it is very complicated to predict the level of differencing.

Regression Analysis for time series data:

Although powerful and flexible, ARIMA models exhibit a lot of limitations which can be solved by Regression. Some of those limitations are:

1-ARIMA does not allow interaction terms like regression.

2-ARIMA doesn’t allow some useful diagnostics such as the test for outliers.

3-ARIMA models are very complicated to build and need a certain level of expertise to be able to identify the orders.

Regression with ARMA errors:

Consider the model Y(t)= x’(t)+W(t).

or in the matrix form:

Y=Xβ+W.

Where Y=(Y1…Yn)’ is the vector of observations at times t=1,…n, X is the design matrix wherex’(t)=(x(t1)…x(tk)) and β=(β1….βk) is the vector of regression coefficients. And the components of W=(W1…Wn)’are the values of an ARMA (p,q) process where

Φ(B)=θ(B)Z(t) where Zt~WN(0,σ2). Note here that the design matrix X could denote values of present and past data used to forecast future data denoted by the response variable Y(t).

We can estimate βby ordinary least square (OLS) :

(Y-Xβ)’(Y-Xβ)=Σ(Y(t)-x’(t)β)2.

Which yields β hat(OLS)=(X’X)-1X’Y.

Note that the E(βhat(OLS))=β and its covariance matrix is :

Cov(βhat(OLS))=(X’X)-1X’ΓnX(X’X)-1.

Where Γn=E(WW’).

If we want to minimize the weighted sum of squares of

(Y-Xβ)’Γn(Y-Xβ) we can partially differentiate with respect each component of β and setting the derivatives to zero, we find that.

Βhat(GLS)= (X’Γn-1X)-1X’Γn-1Y.

Cov(βhat(GLS))=(X’Γn-1X)-1.

Decomposition of time series data:

In real life situations most of the time series is non-stationary. Most of the times we can decompose the data into trends and seasonal components, the general equation is of the form:

Yt = m(t) + s(t) + ε(t).

Where Y(t) : is the time series data.

M(t): is the trend component.

S(t): the seasonal component.

ε(t): the noise component.

1-Analyzing the trend:

There are different types of trends.

• No trend: mt = β0

• Linear trend: mt = β0 + β1t

• Quadratic trend: mt = β0 + β1t + β2t2

• pth-order polynomial trend: mt = β+Σβjtj

where the summation goes from 1 to p.

we can estimate the coefficients of the trend function are estimated by the least square method where we minimize:

Q= Σj=1 to n (Y(t)-m(t))

Where m(t)=β0+Σj=1 to p βjtj

We can estimate the trend using the linear model in R the following is an example on how to estimate the trend in R:

Example 1:

The data is:” Assets at Banks whose ALLL exceeds their Nonperforming Loans”

From St. Louis Fed: Banking. (updated on 2005-09-19).

See Appendix 1 for data.

R code:

> y<-scan(file="C:\\bank.txt")

Read 70 items

>ts.plot(y)

> tm<-c(1:70)

> y<-cbind(y,tm)

> res.p0<-lm(y~1)

> summary(res.p0)

Response y :

Call:

lm(formula = y ~ 1)

Residuals:

Min 1Q Median 3Q Max

-40.34 -25.12 10.10 17.98 20.43

Coefficients:

Estimate Std. Error t value Pr(>|t|)

[1,] 73.560 2.603 28.27 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 21.77 on 69 degrees of freedom

Response tm :

Call:

lm(formula = tm ~ 1)

Residuals:

Min 1Q Median 3Q Max

-3.450e+01 -1.725e+01 5.551e-16 1.725e+01 3.450e+01

Coefficients:

Estimate Std. Error t value Pr(>|t|)

[1,] 35.500 2.432 14.60 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.35 on 69 degrees of freedom

Next we analyze the possibility of adding the time variable to our model

> summary(res.p1<-lm(y~1+tm))

Response y :

Call:

lm(formula = y ~ 1 + tm)

Residuals:

Min 1Q Median 3Q Max

-23.188 -11.200 -2.622 12.994 24.235

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 43.85892 3.30353 13.28 < 2e-16 ***

tm 0.83664 0.08088 10.35 1.34e-15 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 13.67 on 68 degrees of freedom

Multiple R-Squared: 0.6115, Adjusted R-squared: 0.6057

F-statistic: 107 on 1 and 68 DF, p-value: 1.342e-15

Response tm :

Call:

lm(formula = tm ~ 1 + tm)

Residuals:

Min 1Q Median 3Q Max

-2.549e-15 -6.085e-16 -1.042e-16 3.173e-16 8.413e-15

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 7.882e-15 3.386e-16 2.328e+01 <2e-16 ***

tm 1.000e+00 8.289e-18 1.206e+17 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.401e-15 on 68 degrees of freedom

Multiple R-Squared: 1, Adjusted R-squared: 1

F-statistic: 1.455e+34 on 1 and 68 DF, p-value: < 2.2e-16

The p-value for the explanatory variable tm is 2e-16 => the linear model is significant..

We try to fit a quadratic term.

> tmsq<-tm^2

> summary(res.p2<-lm(y~tm+tmsq))

Response y :

Call:

lm(formula = y ~ tm + tmsq)

Residuals:

Min 1Q Median 3Q Max

-22.900 -8.520 3.331 7.314 16.955

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 23.361842 3.855486 6.059 6.99e-08 ***

tm 2.544728 0.250596 10.155 3.43e-15 ***

tmsq -0.024058 0.003421 -7.033 1.32e-09 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 10.45 on 67 degrees of freedom

Multiple R-Squared: 0.7765, Adjusted R-squared: 0.7698

F-statistic: 116.4 on 2 and 67 DF, p-value: < 2.2e-16

Response tm :

Call:

lm(formula = tm ~ tm + tmsq)

Residuals:

Min 1Q Median 3Q Max

-2.809e-15 -5.610e-16 -1.169e-17 3.899e-16 8.126e-15

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 7.882e-15 5.180e-16 1.522e+01 <2e-16 ***

tm 1.000e+00 3.367e-17 2.970e+16 <2e-16 ***

tmsq 4.011e-19 4.596e-19 8.730e-01 0.386

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.404e-15 on 67 degrees of freedom

Multiple R-Squared: 1, Adjusted R-squared: 1

F-statistic: 7.252e+33 on 2 and 67 DF, p-value: < 2.2e-16

We see that the p-value associated with the quadratic term is insignificant which means that our trend is linear of the form.

Y(t)=β0+β1t+εt

β0 hat= 43.85

β1hat=0.83664.

we can also run a Durbin-Watson test in R to test for autocorrelation

mod.ols<-lm(y~tm)

> durbin.watson(mod.ols, max.lag=5)

lag Autocorrelation D-W Statistic p-value

1 0.9488860 0.09028149 0

2 0.8838850 0.21060892 0

3 0.8337977 0.30180679 0

4 0.7968215 0.37246261 0

5 0.7259982 0.51158701 0

Alternative hypothesis: rho[lag] != 0

All the Durbin-Watson statistics are significant and we can see that there a strong autocorrelation at all lags .

Note: the cyclical component in this example is insignificant by looking at the graph.

2-Analyzing the seasonal component

In weekly or monthly data we will often spot variation that is dependent on the time of the year. This component is known as the seasonal component. It describes any regular fluctuations with a period of less than one year. The following example illustrates how this can be done.

Example 2:

Linear regression is used to analyze the trend component in a time series data we can also use the ordinary least squares (OLS) to estimate the coefficients of the seasonal component. The following example illustrates how this can be done.

(the data here is the temperature in England recorded since 1723 over 132 months).

See Appendix 2 for data.

A preliminary plot of the data follows:

> tsdata<-scan(file="C:\\temperature.txt")

Read 132 items

> ts.plot(tsdata);

The graph indicates the presence of cyclical variation and since the plot shows a cyclical behavior of the data, we can use the sine and cosine functions to explain the fluctuations in the data.

By analyzing the plot we can see that the period is 12 months, that is, the length of time for the time series to complete one cycle is 12 months.

In order to combine the sine and cosine functions, we introduce the variable t, where t takes values from 1 to 132 (1 ≤ t ≤ 132) and t(i) denotes the ith month. Note that we multiplied the terms inside the sine cosine function by 2 and divided by 12 to accommodate the fact that a sine and cosine function attain zero at every half-period.

Then we use regression analysis to find the coefficients of the sine cosine function plus the intercept.

> z<-sin(2*pi*t/12)

> z1<-cos(2*pi*t/12)

> tsdata<-cbind(tsdata,t);

> dd<-data.frame(tsdata,z,z1) )

> cyclical<-lm (tsdata~z+z1,dd)

> summary(cyclical);

Call:

lm(formula = tsdata ~ z + z1, data = dd)

Residuals:

Min 1Q Median 3Q Max

-2.79583 -0.89692 0.09245 0.77037 3.84387

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 9.6182 0.1154 83.35 <2e-16 ***

z -4.0787 0.1632 -24.99 <2e-16 ***

z1 -5.2223 0.1632 -32.00 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.326 on 129 degrees of freedom

Multiple R-Squared: 0.9274, Adjusted R-squared: 0.9263

F-statistic: 824.4 on 2 and 129 DF, p-value: < 2.2e-16

Conclusion S(t)= 9.6182-4.0787*sin(2*Pi*t/12)-5.2223*cos(2*Pi*t/12).

If we were to rewrite the general equation of a time series from above we could have the following relationship between the time series and its components:

Yt - st = mt + εt

(Time series data – seasonal component = trend component + error component)

> y<-tsdata;

According to the formula above we can test the residual of the regression of Y(t) on the seasonal component to find a trend in the data.

> residf<-resid(cyclical);

The residuals of Y(t) -S(t) will give us the trend component and the white noise if we regress the residuals of Y(t)-S(t) (residuals) on time we get:

For testing the linear trend : mt = β0 + β1t

> summary(lm(residf~t))

Call:

lm(formula = residf ~t)

Residuals:

Min 1Q Median 3Q Max

-2.833070 -0.957801 0.002202 0.811573 3.474899

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.450215 0.226688 -1.986 0.0491 *

t 0.006770 0.002958 2.289 0.0237 *

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.

Residual standard error: 1.295 on 130 degrees of freedom

Multiple R-Squared: 0.03874, Adjusted R-squared: 0.03135

F-statistic: 5.239 on 1 and 130 DF, p-value: 0.02369

Conclusion: we can see that the slope β1 almost =0 which makes sense by looking at the time series plot for our data since we can’t detect any trend just by looking at the graph.

Therefore Y(t)-S(t)= white noise (e(t)).

Now that we eliminated the seasonal component and the trend component all we have left is the white noise e(t) which should be stationary.

We can confirm our hypothesis that e(t) is stationary by looking at the acf and pacf plots for the residuals of the cyclical components

>acf(residc);

>pacf(residc);

Now that we know the white noise is stationary we have to fit a proper ARMA model, so we try different orders in R for the ARIMA model but we find that the order that gives us the best AIC(smallest)with a little tradeoff for sigma^2 is AR(1).

> arima(residc,order=c(1,0,0))

Call:

arima(x = residc, order = c(1, 0, 0))

Coefficients:

ar1 intercept

0.2980 0.004

s.e. 0.0856 0.155

sigma^2 estimated as 1.572: log likelihood = -217.22, aic = 440.43

Several models were tested in R and the above output was obtained. Looking at the estimate of sigma squared, the log likelihood and the AIC we draw the conclusion that our data best fit for the white noise is an AR(1) model. This is because the AIC is the smallest in that case.

The relationship between the AIC and the estimate of sigma squared is represented by the mathematical relationship:

AIC = n log σ2 + 2M.

Where M denotes the number of parameters in the model.

Conclusion:

We have seen how to use regression to estimate the coefficients in the trend and seasonal components. We had to estimate different components, which might gives us doubts on the accuracy of regression in forecasting however some techniques in numerical methods allow us to estimate with high degree of accuracy.

Also, regression can handle a large number of predictors in large or moderately large samples. A rough rule of thumb is that the number of predictors in a problem should not exceed one-tenth the sample size.

Reference:

1-

2-Fox, John. “Time-Series Regression and Generalized Least Squares.” Appendix to an R and S-Plus Companion Regression. January 2002.<

3-

4-Brockwell, Peter J. and Davis, Richard A.” Introduction to Time Series and Forecasting.” Second Edition.

5-Venables, W.N., and Ripley, B.D. “ Modern Applied Statistics with S.” Springer: mid 2002.

Appendix:

1-Data for example 1:Assets at Banks whose ALLL exceeds their Nonperforming Loans”

From St. Louis Fed: Banking. (updated on 2005-09-19).

42.49 / 65.39 / 92.35 / 93.63
45.31 / 70.94 / 93.3 / 90.51
43.33 / 75.17 / 93.01 / 90.5
48.76 / 82.34 / 93.3 / 90.3
47.63 / 84.34 / 93.91
41.75 / 91.52 / 93.99
45.21 / 84.36 / 93.41
45.81 / 90.04 / 89.35
40.19 / 90.15 / 82.92
37.01 / 89.58 / 79.89
36.8 / 88.77 / 77.51
34.43 / 91.55 / 81.64
34 / 91.58 / 87.22
34.39 / 91.5 / 80.18
33.22 / 90.53 / 72.07
38.33 / 93.16 / 73.18
39.97 / 92.97 / 88.64
44.34 / 93.45 / 89.6
48.33 / 92.77 / 82.98
55.58 / 93.17 / 80.17
57.25 / 93.32 / 87.04
62.22 / 92.19 / 93.43

2- Temperature in England starting from 1723 over the period of 132 months.

1.1 4.4 7.5 8.9 11.7 15.0 15.3 15.6 13.3 11.1 7.5 5.8

5.6 4.2 4.7 7.2 11.4 15.3 15.0 16.2 14.4 8.6 5.3 3.3

4.4 3.3 5.0 8.1 10.8 12.2 13.8 13.3 12.8 9.4 6.9 3.9

1.1 4.2 4.2 8.4 13.4 16.4 16.0 15.6 14.7 10.2 6.1 1.8

4.2 5.0 5.1 9.2 13.6 14.9 16.9 16.9 14.4 10.8 4.7 3.6

3.9 2.4 7.1 8.3 12.5 16.4 16.9 16.0 12.8 9.1 7.2 1.6

1.2 2.3 2.8 7.1 10.3 15.1 16.8 15.7 16.6 10.1 8.1 5.0

4.1 4.7 6.2 8.7 12.4 14.0 15.3 16.3 15.3 10.9 9.2 3.4

1.9 2.2 6.0 6.8 12.1 15.6 16.3 16.7 15.3 12.3 7.8 5.2

2.4 6.4 6.1 8.9 11.4 14.6 16.0 16.6 14.5 10.9 6.3 2.2

6.9 6.0 5.9 10.0 11.2 15.2 18.3 16.1 12.8 9.1 6.5 7.6

1