Assignment #3 – Forecasting Methods (FIN 335) Spring 2018
Due 02/13/18 by Midnight -(70 points)
Complete the problems below. Be sure to include all graphics created, all R code used to create them, and obviously answer the questions pertaining to your analysis. When including R code you must use Courier New 9 point font. Your discussion of results should NOT be in this font! I am fond of 10 or 11 point Palatino Linotype, however you may choose to use something else.
Chapter 3 Exercises (see the R Markdown file on my website next to the assignment link.)
- For the following series, find an appropriate Box-Cox transformation in order to “stabilize the variance”.
- usnetelec
- usgdp
- mcopper
- enplanements
The actual effects of the Box-Cox transformation on these series does not necessarily look
like variance stabilization in some cases. For each of these time series, it is best to view the changes relative to what the original time series looked like. To do this in R follow the command steps below using the first of these time series.
library(fpp2)
> ?usnetelec #View the help file for the time series so you know what it represents.
lambda = BoxCox.lambda(usnetelec)
lambda # See the optimal transformation chosen
usnetelec.tran = BoxCox(usnetelec,lambda)
temp = cbind(usnetelec,usnetelec.tran)
autoplot(temp,facet=T) +
xlab(“Year”) +
ggtitle(“US Annual Electricity Production with Transformation”)
For each of these four time series repeat the above process. Discuss the differences between the untransformed and Box-Cox transformed time series. (10 pts.)
- Why is the Box-Cox transformation unhelpful for the time series cangas? This time series is monthly Canadian gas production, in billions of cubic metres, January 1960 - February 2005. To answer this question it is critical that you plot the time series autoplot(cangas), so you should do that and then answer the question. If you want you can use the steps in Problem 1 to see what the “optimal” Box-Cox transformation does for this time series. (3 pts.)
- Read the monthly retail sales (n.e.c.) time series in from the file RetailNEC.csv on my website. The types of retail stores in the N.E.C. category are listed below.
5259 Retailing n.e.c. This class consists of units mainly engaged in retailing goods n.e.c..
Primary Activities
Animals, live, retailing; Art gallery operation (retail); Brief cases retailing; Briquettes retailing; Coal retailing; Coke retailing; Firewood cutting and retailing; Fireworks retailing; Handbag retailing; Ice retailing; Leather goods retailing (except apparel); Musical instruments retailing; Prams retailing; Retailing n.e.c.; Souvenirs retailing; Specialty stores n.e.c.; Swimming pool retailing; Travel goods retailing; Umbrellas retailing; Wigs retailing
These monthly sales are from New South Wales in Australia (state that Sydney is in).
Plot this time series using the following functions: autoplot, ggseasonplot, gglagplot, ggAcf
a)Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
(5 pts.)
b)What Box-Cox transformation would you select for these retail sales? Construct a plot of the transformed series. Discuss. (3 pts.)
- Calculate the residuals from a seasonal naïve forecast applied to the quarterly Australian beer production data from 1992. The following code will help.
beer =window(ausbeer, start=1992)
fc =snaive(beer)
autoplot(fc)
res =residuals(fc)
autoplot(res)
Test if the residuals are white noise and normally distributed.
checkresiduals(fc)
What do you conclude? (3 pts.)
- Repeat Problem 4 for the time series WWWusage and bricksqin the fpp2library. (6 pts.)
- Are the following statements true or false? Explain your answer. (2 pts. each)
a)Good forecast methods should have normally distributed residuals.
b)A model with small residuals will give good forecasts.
c)The best measure of forecast accuracy is MAPE.
d)If your model doesn’t forecast well, you should make it more complicated.
e)Always choose the model with the best forecast accuracy as measured on the test set.
- For the retail NEC time series in Problem 3, do the following: (10 pts. total)
a)Split the data into training and test sets using:(1)
myts.train =window(retNEC, end=c(2010,12))
myts.test =window(retNEC, start=2011)
b)Check that the time series has been split by producing the following plot: (1)
autoplot(retNEC) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")
c)Calculate forecasts usingsnaiveapplied tomyts.train.(1)
fc =snaive(myts.train)
d)Compare the accuracy of your forecasts against the actual values stored in myts.testand discuss (4)
accuracy(fc,myts.test)
e)Check the residuals for the forecast
checkresiduals(fc)
Do the residuals appear to be uncorrelated and normally distributed?(3)
- Consider the daily closing IBM stock prices (data setibmclose). (10 pts.)
- Produce some plots of the data in order to become familiar with it.
- Split the data into a training set of 300 observations and a test set of 69 observations.
- Try using various benchmark methods to forecast the training set and compare the results on the test set. Which method did best?
- Check the residuals of your preferred method. Do they resemble white noise?
- Consider the sales of new one-family houses in the USA, Jan 1973 – Nov 1995 (data sethsales). (10 pts.)
- Produce some plots of the data in order to become familiar with it.
- Split thehsalesdata set into a training set and a test set, where the test set is the last two years of data.
- Try using various benchmark methods to forecast the training set and compare the results on the test set. Which method did best?
- Check the residuals of your preferred method. Do they resemble white noise?