STAT 520 – Fall 2017 – Test 1

Note: For this midterm exam, you are not allowed to receive help from anyone except me on the exams. For example, you may not talk to other students about the exam problems, and you may not look at other students’ exams. Violations of this policy may result in a 0 on the exam, an F for the course, and/or punishment by the USC Office of Academic Integrity.

Problems 1-3 below involve using R to analyze some real time series. You will need to install the fma package to use these data sets. Under the Packages menu in R, choose Install packages, and find a USA mirror site and choose the fma package. Your reports should be typed in paragraph form and should include relevant graphs where necessary. For each report, the amount of actual text (not counting plots and graphs) does not need to be more than about one page in length.

Note that there may be several ways to satisfactorily answer these questions. Your reports will be graded partly on the quality of the statistical analysis that you do, and partly on your ability to communicate your conclusions clearly and concisely. Specifically, each problem will be worth 20 points, for a total of 60 points:

Writing (out of 10 points): How organized, clearly written, comprehensible, and grammatically correct is the report? Would the client reading this report be confident that it was written by an educated, well-trained statistical scientist?

Analysis (out of 10 points): Were the graphs and data analyses appropriate for the problem? Were the analyses carried out correctly? Were your statistical conclusions about the data set sensible and clearly justified by numerical or graphical evidence?

1. It is the mid-1990s. All you want to do is lie on your couch and watch the latest episodes of “Seinfield” and “Friends”, but the Australian beer industry needs your help. The time series object beer in the fma package contains monthly Australian beer production from January 1991 through August 1995. Type library(fma); data(beer); print(beer) to see the data.

You should analyze these data as completely as possible and write a report to address such questions as: What trend model(s) best capture the trends in beer production over time? Once the trend has been accounted for, what can you say about the behavior of the detrended data? Can either the original time series or the detrended series be described using any common models? For the various models you try, assess the fit of the models. Are any transformations of the data necessary? Make conclusions that relate to how the beer production changes, both long-term over the observed period of years, and in terms of patterns of month-to-month variation. Augment your report with relevant graphics or plots, and be sure to comment clearly about what the graphs tell us.

2. The 1970s and early 1980s was a dangerous time period in some respects. The time series objects ukdeaths and usdeaths in the fma package contain, respectively:

ukdeaths: Monthly total deaths and serious injuries on UK roads from Jan 1975 – Dec 1984. (In February 1983, new legislation in the UK came into force requiring seat belts to be worn).

usdeaths: Monthly accidental deaths in the USA during a period of time in the 1970s.

Type library(fma); data(ukdeaths); print(ukdeaths); data(usdeaths); print(ukdeaths) to see the data.

You should analyze the data in both time series and write a report to address such questions as: What trend model(s) best capture the trends in UK road casualties over time? (Can you assess the effect of the seat-belt legislation on the UK data?) What trend model(s) best capture the trends in US accident deaths over time? Once the trend has been accounted for, what can you say about the behavior of the detrended data, for both models? Can either the original time series or the detrended series be described using any common models? For the various models you try, assess the fit of the models. Are any transformations of the data necessary? Although the variables measured in the two countries are not exactly equivalent, does the behaviour of the UK time series seem to be associated with the behavior of the US time series? If so, describe the association. Make conclusions that relate to how the accident counts change, in both countries, both long-term over the observed period of years, and in terms of patterns of month-to-month variation. Augment your report with relevant graphics or plots, and be sure to comment clearly about what the graphs tell us.

Note: The R code

window(tsobject, start=startingyear, end=endingyear)

could be useful for extracting part of a time series, where tsobject is the name of a time series object is R, startingyear is the numeric starting year (note this need not be a whole number, e.g., start=1970.5 would start the extraction halfway through 1970), and endingyear is the numeric ending year (again, e.g., end=1995.25 would end the extraction ¼ of the way through 1995).

3. The multivariate time series object petrol in the fma package contains four time series which are respectively: Sales of chemicals and allied products; Sales of Bituminous coal products; Sales of petroleum and coal products; Sales of motor vehicles and parts, in the U.S. from Jan. 1971 to Dec. 1991.

Type library(fma); data(petrol); print(petrol); plot(petrol) to see the data.

You should analyze the data in the “chemicals” time series and the “vehicles” time series and write a report to address such questions as: What trend model(s) best capture the trends in sales of chemicals over time? What trend model(s) best capture the trends in sales of vehicles over time? Once the trend has been accounted for, what can you say about the behavior of the detrended data, for both models? Can either the original time series or the detrended series be described using any common models? For the various models you try, assess the fit of the models. Can any transformations improve the fit? Is there any apparent association between chemical sales and vehicle sales over time? If so, describe the association. Make conclusions that relate to how the sales amounts (for both chemicals and vehicles) change, both long-term over the observed period of years, and in terms of patterns of month-to-month variation. Augment your report with relevant graphics or plots, and be sure to comment clearly about what the graphs tell us.

Some useful R code to view all 4 time series and to pick out the individual time series is:

print(petrol); plot(petrol)

petrol.chemicals = petrol[,'Chemicals']; print(petrol.chemicals)

petrol.vehicles = petrol[,'Vehicles']; print(petrol.vehicles)