Questions on Simple Linear Regression

EXAMPLE EXAM QUESTIONS ON SIMPLE LINEAR REGRESSION

Questions 1-7 refer to the following situation: Stock Prices, Y, are assumedto be affected by the annual rate of dividend of stock, X. A simple linear regression analysis was performed on 20 observations and the results were::

Regression Equation Section

IndependentRegressionStandardT-ValueProb

VariableCoefficientError(Ho: B=0)Level

INTERCEPT -7.964633 3.11101359 -2.560 0.0166

X1 12.548580 1.27081204 9.874 0.0001

1. What statistical conclusion should you make about the effect of the dividend on average stock price?

A. Since 11.30869 > table value, reject the null hypothesis.

B. Since 12.54858 > table value, reject the null hypothesis.

C. Since 9.874 < table value, reject the null hypothesis.

D. Since 9.874 > table value, reject the null hypothesis.

E. Since 0.7895 < table value, fail to reject the null hypothesis.

2. What is the 95% confidence interval for a value of Y given an X value of 2.36? You are given the standard error of this estimate is 3.351

1) in the sample is interpreted as: I am 95% confident that

A. the stock price for a stock with a dividend rate of 2.36% fallsbetween $14.61 and $28.69.

B. the mean stock price for all stocks with a dividend rate of 2.36%falls between $14.61 and $28.69.

C. the variance in stock price for all stocks falls between $14.61 and$28.69.

D. the dividend rate for all stocks falls between $14.61 and $28.69.

E. for each one point increase in dividend rate, the stock price willincrease from $14.61 and $28.69

3. Which one of the following assumptions is incorrectly stated?

A. The stock price is normally distributed for any dividend rate.

B. The stock price has the same variability for any dividend rate.

C. The stock price for any dividend rate is a linear function of dividendrate.

D. The difference between the stock price and the expected stock price

given the dividend rate is independent from company to company.

4. The interpretation of 0.7895, the value of R-square (the coefficientof determination) is:

A. 78.95% of the sample stock prices (around the mean stock price) can beattributed to a linear relationship with the dividend rate in thepopulation.

B. the mean stock price will be estimated to increase $97.50 for eachpoint increase in the rate.

C. the mean stock price will be increase $78.95 for each point increasein the rate.

D. the stock price will increase $78.95 for each point increase in therate.

E. 78.95% of the sample variability in stock price (around the mean stockprice) can be attributed to a linear relationship with the dividend rate.

5. What is the estimate of the change in expected stock prices when thedividend rate increases by one point?

A. 97.50

B. -7.964633

C. This is a parameter not a statistic.

D. 12.54858

E. 5.36546

6. The estimate of the slope will vary from sample to sample, theestimate of the standard deviation of beta-hat is:

A. 3.36284

B. 3.14983

C. 0.39274

D. 12.54858

E. 1.27081

7. A 95% confidence interval for the average stock price given the rateof return will use the following t value:

A. 9.874

B. -2.560

C. 2.101

D. 2.045

E. 2.153

Answers to 1-7

1. D from computer printout use the t-test value across from X1

2. A this is a confidence interval for a conditional mean

3. C the mean stock price falls on the line

4. E r-square is % of sample variation of y explained by x

5. D This is beta-hat – see computer printout to the right of X1

6. E This is the standard error ofhat to right of X1

7. C All t-values in simple linear regression have n-2 d. f.

Questions 8-17 are concerned with the following situation: A fireinsurance company wants to relate the amount of fire damage (y) inmajor residential fires to the distance between the residence and thenearest fire station (x). The study is to be conducted in a largesuburb of a major city, a sample of 15 recent fires in this suburb isselected. The 15 values and the printout follow:

OBS X Y

1 3.4 26.2

2 1.8 17.8

3 4.6 31.3

4 2.3 23.1

5 3.1 27.5

6 5.5 36.0

7 0.7 14.1

8 3.0 22.3

9 2.6 19.6

10 4.3 31.3

11 2.1 24.0

12 1.1 17.3

13 6.1 43.2

14 4.8 36.4

15 3.8 26.1

16 3.5 .

Dependent Variable: Y $1000 fire damage

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pro

Model 1 841.76636 841.76636 156.886 0.0001

Error 13 69.75098 5.36546

Total(Adjusted) 14 911.51733

Root MSE 2.31635 R-square 0.9235

Dep Mean 26.41333 Adj R-sq 0.9176

C.V. 8.76961

Parameter Estimates

Parameter Standard T for H0:

Variable Estimate Error Parameter=0 Prob > |T|

INTERCEPT 10.277929 1.42027781 7.237 0.0001

X 4.919331 0.39274775 12.525 0.0001

Dep Actual Predicted 95% LCL 95% UCL 95% LCL 95%

Obs Y Value Mean Mean Individual Individual

16 . 27.4956 26.1901 28.8011 22.3239 32.66

8. Which one of the following assumptions is incorrect?

(A) The difference between the fire damage and the expected firedamage given the distance is independent from house to house.

(B) The fire damage is normally distributed for any distance.

(D) The mean fire damage for any distance is a linear function ofdistance.

9. You will find the value 4.919331 in the printout under ParameterEstimates. This is interpreted as:

(A) The mean fire damage will increase $4,919.33 for each milefrom the fire station.

(B) The mean fire damage will be estimated to increase $4,919.33for each mile from the fire station.

(D) The mean fire damage will be $4,919.33 given the distance.

(E) The estimated mean fire damage will be $4,919.33 given thedistance.

10. The estimate of the standard deviation of fire damage for all homesthe same distance from the fire station is (in thousands of dollars)

(A) 0.392744775

(B) 2.31635

(D) 69.75098

(E) 5.36546

11. The interpretation of 0.9235, the value of R-square (thecoefficient of determination) is:

(A) 92.35% of the variability in fire damage (around the meanfire damage) can be attributed to a linear relationship with thedistance to the fire station in the population.

(B) the mean fire damage will be estimated to increase $923.50for each mile from the fire station.

(D) the fire damage will increase $923.50 for each mile from thefire station.
(E) 92.35% of the sample variability in fire damage (around themean fire damage) can be attributed to a linear relationship with thedistance to the fire station.

12. To test the null hypothesis that the parameter of the slope iszero, the test statistic value is:

(A) 0.9235

(B) 0.9176

(D) 12.525

(E) 7.237

13. For testing the slope is zero versus the alternative that the slopeis not zero (use alpha of 0.05), the rejection region is: Reject thenull hypothesis if

(A) t > 2.160 or t < -2.160

(B) | t | < 12.525

(D) t > 12.525

(E) t > 2.160

14. The 95% confidence interval for the mean fire damage for all house 3.5miles from the fire station is: (in thousands of dollars)

(A) 15.3442 to 25.8279

(B) 4.070 to 5.768

(D) 13.4329 to 17.9455

(E) 26.1901 to 28.8011

15. The 95% confidence interval for the mean (25.7076 to 28.2997) forthe first house (OBS 1) in the sample is interpreted as: I am 95%confident that

(A) the fire damage for a house 3.4 miles from the fire stationfalls between $25,707.60 and $28,299.70.

(B) the fire damage for all houses 3.4 miles from the firestation falls between $25,707.60 and $28,299.70.

(D) the average fire damage for all houses 3.4 miles from thefire station falls between $25,707.60 and $28,299.70.

(E) for each one mile from the fire station, the mean firedamage will increase from $25,707.60 and $28,299.70

16. In this sample for each one standard deviation that a house isfrom the fire station, the mean fire damage will be estimated toincrease 0.96 standard deviations. This is

(A) the coefficient of correlation, r

(B) the sample standard deviation, s

(D) coefficient of determination, r-square

(E) the least squares coefficient, beta hat

17. The difference between the actual value of y and the predicted

value of y (y-yhat) is called

(A) a standard deviation

(B) a slope

(D) a sample standard deviation

(E) an error

ANSWERS for 8-17

8. C fire damage has the same variance given distance for any distance

9. B this is the beta hat

10 B this is an estimate of sigma of y given x, the square root of MSE

11 E r-square is % of sample variation of y explained by x

12 D use t value from computer printout across from X.

13 A use a t with n-2 degrees of freedom

14 E see the 16th observation under the “mean” columns

15 D this is a confidence interval for a conditional mean

16 A this is the definition of pearson’s r from class notes

17 C this is the definition of the residual

QUESTIONS 18-27 DEAL WITH THE FOLLOWING SITUATION: The expected sales of aproduct in a city are assumed to be affected by the per capitadiscretionary income and the population of the city. Per capitadiscretionary income will be referred to as PCDI in all the questions. InQuestions 1-10 examine only the effect of per capita discretionary incomeon the mean sales. Thus the following model is hypothesized:

E(Y) = B0 + B1 X1 where

Y = Sales (in thousands of dollars)

X1 = Per Capita Discretionary Income (in dollars)

A sample of 15 cities, along with their sales, per capita discretionaryincome, and the population of the city (in thousands) is given in theattached printout. The 15 values and a printout follow:

OBS INCOME SALES

1 2450 162

2 3254 120

3 3802 223

4 2838 131

5 2347 67

6 3782 169

7 3008 81

8 2450 192

9 2137 116

10 2560 55

11 4020 252

12 4427 232

13 2660 144

14 2088 103

15 2605 212

16 2500 .

17 3500 .

Root MSE 49.51434 R-square 0.4087

Dep Mean 150.60000 Adj R-sq 0.3632

Parameter Estimates

Coefficient Standard T for H0:

Variable Estimate Error B=0 Prob

INTERCEP -10.207 55.147 -0.185 0.8560

INCOME 0.054 0.018 2.998 0.0103

Dep 95% LCL 95% UCL 95% LCL 95% UCL

Obs ActualPredicted Mean Mean Individual Individual

16 . 125.5 92.5 158.5 13.5 237.5

17 . 179.8 145.1 214.5 67.3 292.3

18. The 95% confidence interval for the mean sales of all cities with PCDI= 2500 is

A. 92.5 to 158.5

B. can not be calculated because of missing values

C. 3500

D. 88.6 to 156.9

E. 13.5 to 237.5

19. When testing the null hypothesis that the slope equals to zero versusthe alternative hypothesis that the slope does not equal to zero, therejection region would be: reject the Null if

A. t > t(14, 0.025) or t < -t(14, 0.025)

B. t > t(13, 0.05)

C. F < F(1, 13, 0.05)

D. |t| > t(13, 0.025)

E. p-value > alpha

20. What distribution would you use to infer about the variation of salesamong all cities with the same PCDI?

A. the Chi-square distribution

B. the t distribution

C. the F distribution

D. a t with no interaction and an F with interaction

21. Given the p-value of the F-test is 0.0103, we can interpret this as

A. Given the null is true, there is a 1.03% chance of finding this valueof the test statistic or something more extreme.

B. The percent of sample variability of Y explained by the independentvariable is 1.03%

C. There is a 98.97% probability that the null hypothesis is right.

D. There is a 98.97% probability that the null hypothesis is wrong.

E. The probability of a type I error is 0.0103.

22. Does the PCDI help predict the sales of the product?

A. Yes, because 2.998 > the table value

B. No, because .8560 is greater than alpha

C. Yes, because 8.986 < the table value

D. Yes, because of MSE = 2451.66959

E. No, because 0.018 is less than the table value

23. What is the interpretation of the coefficient of determination?

A. Don't know and don't care (Hint, this is a wrong answer and best leftunspoken within hearing of instructor).

B. 40.87 probability that sales is linearly related to PCDI.

C. 40.87 percent of the sample variability of sales can be attributed tochanges in PCDI.

D. 40.87 percent of the variability of PCDI can be attributed to a linearrelationship between mean PCDI and sales.

E. 40.87 percent of the sample variability of PCDI can be attributed to alinear relationship between mean PCDI and sales.

24. What table value would you use in the calculation of a 90% confidenceinterval for a value of Y given a value of X?

A. 1.645

B. 3.140

C. 1.771

D. 2.650

E. 2.998

25. How many estimated standard errors is the point estimate of the slopeaway from zero? Slope is the change in the mean sales for each dollarincrease in PCDI.

A. 0.054

B. 0.4087

C. -10.207

D. 2.998

E. 0.018

26. You know that most cities have small PCDI and only a few have largePCDI. Is this a violation of any assumption?

A. Yes, because the variation of PCDI would then be unequal.

B. No, because sales has to be normally distributed but PCDI does not haveto be.

C. Yes, this would violate the linear relationship between the mean salesand PCDI.

D. No, because the variance of sales has nothing to do with the problem.

E. Yes, a violation of normality.

27. What would be the change in the estimated mean sales for each onestandard deviation increase in PCDI?

A. 0.3632 standard deviations

B. can not be calculated.

C. 0.4087 squared dollars

D. 0.6393 (square root of 0.4087) standard deviations

E. 0.0540 dollars

Answers to 18-27

------

18. A see observation number 16

19. D use a t with n-2 degrees of freedom

20. A variance is related to chi-squared, see Table 3 in class notes

21. A see definition of p-value in text book

22. A use the F test here

23. C see the definition of r-squared

24 C use t with n-2 d.f

25 D defintion of t-test value

26 B assumptions apply to y|x or to e but not on x

27. D this is the definition of r in class notes