MAR 5621 – In-Class Project 1
Simple Linear Regression
1)Download the U.S. County Retail Data. The columns represent: county name, per capita retail sales, per capita retail establishments, per capita income, per capita federal expenditures, and males per 100 females.
a)Fit the model based on the population of 845 counties, relating per capita retail sales to per capita income. Give the parameters0, , and and plot the data.
b)Give a histogram of the errors, = Y-(0+1X). Is it approximately normally distributed?
c)Plot the residuals versus the mean (fitted) values. Does the error variance appear to be constant?
d)What proportion of the total variation in per capita retail sales is “explained” by per capita income?
e)Take a random sample of n=40 counties by generating a column of random numbers, then sorting the data set based on the random numbers, then taking the top 40 rows only (deleting the last 805).
f) Fit the model based on the sample of 40 counties, relating per capita retail sales to per capita income. Give the estimates b0, b, and Seand plot the data.
g)Obtain 95% confidence limits for 0 and 1. Do the intervals contain the true parameters from part a)?
h)Give a histogram of the errors, = Y-(0+1X). Is it approximately normally distributed?
i)Plot the residuals versus the mean (fitted) values. Does the error variance appear to be constant?
j)What proportion of the total variation in per capita retail sales is “explained” by per capita income?
k)Obtain a 95% prediction interval for a county with a per capita income of 20.0
2)Download the U.S. auto sales data. This represents monthly U.S. sales in millions.
a)Plot the new car sales versus the new car sales and the least squares regression line.
b)Fit a regression model, relating new sales versus old sales.
c)Obtain the coefficient of correlation
d)Plot the residuals versus the fitted value and the series month (SMONTH). Comment on patterns.
e)Test to determine whether the residuals are positively correlated.