Four Criteria for Useful Data

BMGT 366: Chapter 3 - Exploring Data

Four criteria for useful data

Reliable and accurate
Relevant
Consistent
Timely

Types of data:

Cross-sectional dataObservations collected at an instant in time
Time series dataObservations collected over successive increments of time

Four general types of time series data patterns

Horizontal:Data fluctuate around constant mean – stationary in its mean
Trend:Long term growth or decline
Cyclical:Wave like fluctuation around the mean - rise and fall over a long time period
Seasonal:Pattern of change that repeats itself year after year

Exploring data patterns with autocorrelation analysis

Autocorrelation (rk):Correlation between a variable lagged one or more periods and itself.

r1 = Autocorrelation for a lag of 1 period,

rk = Autocorrelation for a lag of k periods

SE(rk) = Standard deviation of rk

, and

SE(r1) = SE(r2) =

Hypothesis testing:

Ho: k = 0;

Ha: k 0

t = for k = 1, 2 ….

Example 3.1 (page 61):

Compute the autocorrelation for time lag 1 and test it for significance. Let  = 0.05.

Time t / Yt / Yt-1 / (Yt-) / (Yt-1-) / (Yt-)2 / (Yt-)(Yt-1-)
1 / 123 / --- / -19 / --- / 361 / ---
2 / 130 / 123 / -12 / -19 / 144 / 228
3 / 125 / 130 / -17 / -12 / 289 / 204
4 / 138 / 125 / -4 / -17 / 16 / 68
5 / 145 / 138 / 3 / -4 / 9 / -12
6 / 142 / 145 / 0 / 3 / 0 / 0
7 / 141 / 142 / -1 / 0 / 1 / 0
8 / 146 / 141 / 4 / -1 / 16 / -4
9 / 147 / 146 / 5 / 4 / 25 / 20
10 / 157 / 147 / 15 / 5 / 225 / 75
11 / 150 / 157 / 8 / 15 / 64 / 120
12 / 160 / 150 / 18 / 8 / 324 / 144
Yt = 1704 / (Yt-)2 = 1474 / (Yt-)(Yt-1-) = 843

r1 = = .572, SE(r1) = ; df = n-1 = 12 – 1 = 11

Hypothesis testing: Ho: k = 0; Ha: k 0; t =

For k=1, t = .572/.289 = 1.98, p-value = .07, do not reject Ho. There is no evidence for k 0

Your turn: Repeat the hypothesis test for k = 2 in Excel.

Autocorrelation function (ACF)

ACF (aka correlogram) is a graph of autocorrelation for various lags of a time series and the corresponding confidence limits for k.

Confidence limits for k = 0  t/2,n-1 SE(rk)

Autocorrelation significantly different from zero is indicated whenever a value for rk falls outside the corresponding confidence limits.

Testing for the significance of a set of consecutive k

Ho: 1 = 2 = ….. m = 0

H1: Not all k = 0  = .05

Test statistic = LBQ = follows 2 distribution with m degrees of freedom, where m = number of consecutive “k” being tested.

Testing for time series data patterns

Stationary data (Horizontal, White noise)

Autocorrelation coefficients for a stationary series decline to zero fairly rapidly, generally after the second or third time lag. See example 3.3.

Data with trend

A series that contains a trend is said to be non-stationary. Autocorrelation coefficients remain fairly large for several lags for a non-stationary time series. A method called differencing can be used to remove the trend from a non-stationary time series. First difference = Yt - Yt-1. (See example 3.4).

Data with seasonal pattern

A pattern that repeats itself over a particular time interval is called seasonal. The time interval can be months in a year, days in a week, or hours in a day. Use the following two steps to test for seasonality.

Organizedata by the time period of expected seasonality – i.e. monthly, weekly, etc.
Prepare ACF; if seasonality is present for quarterly data 4will be significant, and for monthly data 12will be significant, etc. See example 3.5.

Choosing a Forecasting Technique

Data type / Possible models
Stationary / Naïve, Moving average, Exponential smoothing, Box-Jenkins
Trend / Naïve, Linear and quadratic exponential smoothing, Simple regression, Exponential Trend models, Box-Jenkins
Seasonal / Naïve, Seasonal exponential smoothing, Multiple regression, Classical decomposition, Box Jenkins

Empirical evaluation for forecasting methods

Statistically sophisticated methods do not necessarily produce more accurate forecasts.
Accuracy measures (MAD, MSE, MAPE) produce consistent results.
The combination of three exponential smoothing methods do well compared to other methods.
The accuracy of the methods is related to the length of the forecasting horizon (monthly, quarterly, yearly). Some methods work well for yearly data and others are more appropriate for quarterly and monthly data.

Measuring Forecasting Error

Basic forecasting notation:

Yt = value of a time series for period t

= forecast value for period t

et = Yt - = residual, or, forecast error for period t

Residual is the difference between an actual value and its forecast value.

Four measures of forecast error (Example 3.6)

1.Mean Absolute Deviation

MAD = MAD is in the same units as the original data.

2.Mean Squared Error

MSE =

3.Mean Absolute Percentage Error

MAPE =

4.Mean Percentage Error

MPE =

Adequacy of a Forecasting Technique

For the model to be adequate,

(i)the residuals must be stationary

(ii)the residuals must follow approximately normal distribution

(iii)the model must be statistically significant

(iv)the model must be easy to understand and use