Chapter 10
simple linear regresion and correlation
(The template for this chapter is: Simple Regression.xls.)
10-1.A statistical model is a set of mathematical formulas and assumptions that describe some real-world situation.
10-2.Steps in statistical model building: 1) Hypothesize a statistical model; 2) Estimate the model parameters; 3) Test the validity of the model; and 4) Use the model.
10-3.Assumptions of the simple linear regression model: 1) A straight-line relationship between X and Y; 2) The values of X are fixed; 3) The regression errors, , are identically normally distributed random variables, uncorrelated with each other through time.
10-4.is the Y-intercept of the regression line, and is the slope of the line.
10-5.The conditional mean of Y, E(Y | X), is the population regression line.
10-6.The regression model is used for understanding the relationship between the two variables, X and Y; for prediction of Y for given values of X; and for possible control of the variable Y, using the variable X.
10-7.The error term captures the randomness in the process. Since X is assumed nonrandom, the addition of makes the result (Y) a random variable. The error term captures the effects on Y of a host of unknown random components not accounted for by the simple linear regression model.
10-8.Advertising versus sales (over a limited range of values; use: control and prediction); and accounting ration versus firm profitability (use: understanding); return on a stock versus return on the market as a whole (use: understanding).
10-9.The least-squares procedure produces the best estimated regression line in the sense that the line lies “inside” the data set. The line is the best unbiased linear estimator of the true regression line as the estimators and have smallest variance of all linear unbiased estimators of the line parameters. Least-squares line is obtained by minimizing the sum of the squared deviations of the data points from the line.
10-10.Least squares is less useful when outliers exist. Outliers tend to have a greater influence on the determination of the estimators of the line parameters because the procedure is based on minimizing the squared distances from the line. Since outliers have large squared distances they exert undue influence on the line. A more robust procedure may be appropriate when outliers exist.
10-11.
Simple RegressionEnergy Density / Calories
X / Y / Error
1 / 3.4 / 289 / 52.6327
2 / 2.5 / 191 / -26.6817
3 / 3.3 / 114 / -120.291
4 / 2.2 / 112 / -99.4532
5 / 4.6 / 220 / -41.2814
6 / 0.6 / 145 / -33.2344
7 / 3 / 109 / -119.063
8 / 0.7 / 92 / -88.3106
9 / 4.9 / 236 / -31.5099
10 / 3.6 / 230 / -10.5196
11 / 2.2 / 271 / 59.5468
12 / 0.7 / 80 / -100.311
13 / 3.2 / 280 / 47.7851
14 / 2.9 / 202 / -23.9864
15 / 5.4 / 271 / -6.89077
16 / 3.7 / 235 / -7.59581
17 / 0.8 / 190 / 7.61323
18 / 0.3 / 75 / -97.0059
19 / 2.6 / 540 / 320.242
20 / 1 / 350 / 163.461
21 / 1.4 / 342 / 147.156
22 / 0.5 / 124 / -52.1582
23 / 2.5 / 242 / 24.3183
24 / 0.9 / 220 / 35.5371
Regression Equation: Calories = 165.777 + 20.7617 Energy Density
Energy density and caloriesr2 / 0.0858 / Coefficient of Determination
Confidence Interval for Slope / r / 0.2929 / Coefficient of Correlation
/ (1-) C.I. for 1
95% / 20.7617 / + or - / 29.9697 / s(b1) / 14.4511 / Standard Error of Slope
Confidence Interval for Intercept
/ (1-) C.I. for 0
95% / 165.777 / + or - / 83.4568 / s(b0) / 40.242 / Standard Error of Intercept
s / 103.413 / Standard Error of prediction
ANOVA Table
Source / SS / df / MS / F / Fcritical / p-value
Regn. / 22073.9 / 1 / 22073.9 / 2.06408 / 4.30094 / 0.1649
Error / 235274 / 22 / 10694.3
Total / 257348 / 23
There is a weak linear relationship (r) and the regression is not significant (r2, F, p-value)
10-12.b1 = SSXY /SSX = 934.49/765.98 = 1.22
10-13.Using SYSTAT:
DEP VAR: Y N: 13 MULTIPLE R: .960SQUARED MULTIPLE R: .922
ADJUSTED SQUARED MULTIPLE R: .915
STANDARD ERROR OR ESTIMATE: 0.995
VARIABLECOEFFICIENT STD ERROR STD COEF TOLERANCE T P(2 TAIL)
CONSTANT3.0570.9710.0003.1480.009
X0.1870.0160.960.100E+0111.3810.000
ANALYSIS OF VARIANCE
SOURCESUM-OF-SQUARES DFMEAN-SQUAREF-RATIOP
REGRESSION128.321128.332129.5250.000
RESIDUAL10.889110.991
Thus, b0 = 3.057 b1 = 0.187
r2 / 0.9217 / Coefficient of DeterminationConfidence Interval for Slope / r / 0.9601 / Coefficient of Correlation
/ (1-) C.I. for 1
95% / 0.18663 / + or - / 0.03609 / s(b1) / 0.0164 / Standard Error of Slope
Confidence Interval for Intercept
/ (1-) C.I. for 0
95% / -3.05658 / + or - / 2.1372 / s(b0) / 0.97102 / Standard Error of Intercept
Prediction Interval for Y
/ X / (1-) P.I. for Y given X
95% / 10 / -1.19025 / + or - / 2.8317 / s / 0.99538 / Standard Error of prediction
Prediction Interval for E[Y|X]
/ X / (1-) P.I. for E[Y | X]
+ or -
ANOVA Table
Source / SS / df / MS / F / Fcritical / p-value
Regn. / 128.332 / 1 / 128.332 / 129.525 / 4.84434 / 0.0000
Error / 10.8987 / 11 / 0.99079
Total / 139.231 / 12
10-14.b1 = SSXY /SSX = 2.11
b0 = b1 = 165.3 (2.11)(88.9) = 22.279
10-15.
Simple RegressionInflation / Return
X / Y / Error
1 / 1 / -3 / -20.0642
2 / 2 / 36 / 17.9677
3 / 12.6 / 12 / -16.294
4 / -10.3 / -8 / -14.1247
5 / 0.51 / 53 / 36.4102
6 / 2.03 / -2 / -20.0613
7 / -1.8 / 18 / 3.64648
8 / 5.79 / 32 / 10.2987
9 / 5.87 / 24 / 2.22121
Inflation & return on stocks
r2 / 0.0873 / Coefficient of Determination
Confidence Interval for Slope / r / 0.2955 / Coefficient of Correlation
/ (1-) C.I. for 1
95% / 0.96809 / + or - / 2.7972 / s(b1) / 1.18294 / Standard Error of Slope
Confidence Interval for Intercept
/ (1-) C.I. for 0
95% / 16.0961 / + or - / 17.3299 / s(b0) / 7.32883 / Standard Error of Intercept
s / 20.8493 / Standard Error of prediction
ANOVA Table
Source / SS / df / MS / F / Fcritical / p-value
Regn. / 291.134 / 1 / 291.134 / 0.66974 / 5.59146 / 0.4401
Error / 3042.87 / 7 / 434.695
Total / 3334 / 8
There is a weak linear relationship (r) and the regression is not significant (r2, F, p-value)
10-16.
Simple RegressionYear / Value
X / Y / Error
1 / 1960 / 180000 / 84000
2 / 1970 / 40000 / -72000
3 / 1980 / 60000 / -68000
4 / 1990 / 160000 / 16000
5 / 2000 / 200000 / 40000
Average value of Aston Martin
r2 / 0.1203 / Coefficient of Determination
Confidence Interval for Slope / r / 0.3468 / Coefficient of Correlation
/ (1-) C.I. for 1
95% / 1600 / + or - / 7949.76 / s(b1) / 2498 / Standard Error of Slope
Confidence Interval for Intercept
/ (1-) C.I. for 0
95% / -3040000 / + or - / 1.6E+07 / s(b0) / 4946165 / Standard Error of Intercept
s / 78993.7 / Standard Error of prediction
ANOVA Table
Source / SS / df / MS / F / Fcritical / p-value
Regn. / 2.6E+09 / 1 / 2.6E+09 / 0.41026 / 10.128 / 0.5674
Error / 1.9E+10 / 3 / 6.2E+09
Total / 2.1E+10 / 4
There is a weak linear relationship (r) and the regression is not significant (r2, F, p-value).
Limitations: sample size is very small.
Hidden variables: the 70s and 80s models have a different valuation than other decades possibly due to a different model or style.
10-17.Regression equation is:
Credit Card Transactions = 39.6717 + 0.06129 Debit Card Transactions
r2 / 0.9624 / Coefficient of DeterminationConfidence Interval for Slope / r / 0.9810 / Coefficient of Correlation
/ (1-) C.I. for 1
95% / 0.6202 / + or - / 0.17018 / s(b1) / 0.06129 / Standard Error of Slope
Confidence Interval for Intercept
/ (1-) C.I. for 0
95% / 177.641 / + or - / 110.147 / s(b0) / 39.6717 / Standard Error of Intercept
Prediction Interval for Y
/ X / (1-) P.I. for Y given X
+ or - / s / 56.9747 / Standard Error of prediction
Prediction Interval for E[Y|X]
/ X / (1-) P.I. for E[Y | X]
+ or -
ANOVA Table
Source / SS / df / MS / F / Fcritical / p-value
Regn. / 332366 / 1 / 332366 / 102.389 / 7.70865 / 0.0005
Error / 12984.5 / 4 / 3246.12
Total / 345351 / 5
There is no implication for causality. A third variable influence could be “increases in per capital income” or “GDP Growth”.
10-18.SSE = Take partial derivatives with respect to b0 and b1:
= 2
= 2
Setting the two partial derivatives to zero and simplifying, we get:
= 0 and = 0.Expanding, we get:
nb0 = 0and - = 0
Solving the above two equations simultaneously for b0 and b1 gives the required results.
10-19.99% C.I. for :1.25533 2.807(0.04972) = [1.1158, 1.3949].
The confidence interval does not contain zero.
10-20.MSE = 10694.3
From the ANOVA table for Problem 10-11:
Source / SS / df / MSRegn. / 22073.9 / 1 / 22073.9
Error / 235274 / 22 / 10694.3
Total / 257348 / 23
10-21.s(b0) = 40.242s(b1) = 14.4511
s(b1) / 14.4511 / Standard Error of Slopes(b0) / 40.242 / Standard Error of Intercept
10-22.Confidence Interval for Slope
/ (1-) C.I. for 195% / 20.7617 / + or - / 29.9697
Confidence Interval for Intercept
/ (1-) C.I. for 0
95% / 165.777 / + or - / 83.4568
95% C.I. for the slope: 20.7617 ± 29.9697 = [-9.208, 50.7314]
95% C.I. for the intercept: 165.777 ± 83.4568 = [82.3202, 249.2338]
10-23.s(b0) = 0.971s(b1) = 0.016; estimate of the error variance is MSE = 0.991. 95% C.I. for : 0.187 + 2.201(0.016) = [0.1518, 0.2222]. Zero is not a plausible value at = 0.05.
Confidence Interval for Slope / (1-) C.I. for 1
95% / 0.18663 / + or - / 0.03609 / s(b1) / 0.0164 / Standard Error of Slope
Confidence Interval for Intercept
/ (1-) C.I. for 0
95% / -3.05658 / + or - / 2.1372 / s(b0) / 0.97102 / Standard Error of Intercept
10-24.s(b0) = 85.44s(b1) = 0.1534
Estimate of the regression variance is MSE = 8122
95% C.I. for b1: 1.5518 2.776 (0.1534) = [1.126, 1.978]
Zero is not in the range.
Confidence Interval for Slope / (1-) C.I. for 1
95% / 1.55176 / + or - / 0.42578 / s(b1) / 0.15336 / Standard Error of Slope
Confidence Interval for Intercept
/ (1-) C.I. for 0
95% / -255.943 / + or - / 237.219 / s(b0) / 85.4395 / Standard Error of Intercept
10-25.s2 gives us information about the variation of the data points about the computed regression line.
10-26.In correlation analysis, the two variables, X and Y, are viewed in a symmetric way, where no one of them is “dependent” and the other “independent,” as the case in regression analysis. In correlation analysis we are interested in the relation between two random variables, both assumed normally distributed.
10-27.r = 0.2929
r / 0.2929 / Coefficient of Correlation10-28.r = 0.960
r / 0.9601 / Coefficient of Correlation10-29.t(5) = = 0.640
Accept H0. The two variables are not linearly correlated.
10-30.Yes. For example suppose n = 5 and r = .51; then:
t = = 1.02 and we do not reject H0. But if we take n = 10,000 and
r = 0.04, giving t = 14.28, this leads to strong rejection of H0.
10-31. We have: r = 0.875 and n = 10. Conducting the test:
t (8) = = = 5.11
There is statistical evidence of a correlation between the prices of gold and of copper. Limitations: data are time-series data, hence not dependent random samples. Also, data set contains only 10 points.
10-34.n= 65r = 0.37t (63) = = 3.16
Yes. Significant. There is a correlation between the two variables.
10-35. = ln [(1 + r)/(1 – 5)] = ln (1.37/0.63) = 0.3884
= ln [(1 + )/(1 – )] = ln (1.22/0.78) = 0.2237
= 1/ = 1/ = 0.127
z = ()/ = (0.3884 – 0.2237)/0.127 = 1.297.Cannot reject H0.
10-36.t (10) = b1/s(b1) = 2.435/1.567 = 1.554.
Do not reject H0. There is no evidence of a linear relationship even at = 0.10.
10-37.t (16) = b1/s(b1) = 3.1/2.89 = 1.0727.
Do not reject H0. There is no evidence of a linear relationship using any.
10-38.t(22, 0.05) = 1.717
b1/s(b1) = 20.7617 / 14.4511 = 1.437
Do not reject H0. There is no evidence of a linear relationship.
10-39.t (11) = b1/s(b1) = 0.187/0.016 = 11.69
Reject H0. There is strong evidence of a linear relationship between the two variables.
10-40.b1/ s(b1) = 1600/2498 = 0.641
Do not reject H0. There is no evidence of a linear relationship.
10-41.t (58) = b1/s(b1) = 1.24/0.21 = 5.90
Yes, there is evidence of a linear relationship.
10-42.t (21) = b1/s(b1) = 3.467/0.775 = 4.474
Yes, there is evidence of a linear relationship.
10-43.t (211) = z = b1/s(b1) = 0.68/12.03 = 0.0565
Do not reject H0. There is no evidence of a linear relationship using any. (Why report such results?)
10-44.b1 = 5.49 s(b1) = 1.21 t(26) = 4.537
Yes, there is evidence of a linear relationship.
10-45.No surprise, since there is no evidence of a linear relationship between the two variables in Problem 10-37.
10-46a.The model should not be used for prediction purposes because only 2.0% of the
variation in pension funding is explained by its relationship with firm profitability.
b.The model explains virtually nothing.
c.Probably not. The model explains too little.
10-47. In Problem 10-11, r 2 = 0.0858. Thus, 8.6% of the variation in the dependent variable is explained by the regression relationship.
r2 / 0.0858 / Coefficient of Determination10-48.In Problem 10-13, r 2 = 0.922. Thus, 92.2% of the variation in the dependent variable is explained by the regression relationship.
10-49.r 2 in Problem 10-16:r 2 = 0.1203
10-50.Reading directly from the MINITAB output: r 2 = 0.962
r2 / 0.9624 / Coefficient of Determination10-51.r 2 = = = 0.873
Thus, 87.3% of the variation in the dependent variable is explained by the regression relationship. Yes, the regression model should therefore be useful in predicting sales based on advertising expenditure.
10-52.No linear relations in evidence for any of the firms.
10-53.r 2 in Problem 10-15:r 2 = 0.873
r2 / 0.8348 / Coefficient of Determination10-54. = =
= +
But: = = 0
because the first term on the right is the sum of the weighted regression residuals, which sum to zero. The second term is the sum of the residuals, which is also zero. This establishes the result:
10-55.From Equation (10-10): b1 = SSXY/SSX.From Equation (10-31):
SSR = b1SSXY. Hence, SSR = (SSXY /SSX)SSXY = (SSXY) 2/SSX
10-56.F(1,22) = 2.064Do not reject H0.
F / Fcritical / p-value2.06408 / 4.30094 / 0.1649
10-57.F (1,11) = 129.525t (11) = 11.381t 2 = 11.3812 = the F-statistic value already calculated.
F / Fcritical / p-value129.525 / 4.84434 / 0.0000
10-58.F(1,4) = 102.39t (4) = 10.119t 2 = F (10.119)2 = 102.39
F / Fcritical / p-value102.389 / 7.70865 / 0.0005
10-59.F (1,7) = 0.66974Do not reject H0.
10-60.F (1,102) = MSR/MSE = = 701.8
There is extremely strong evidence of a linear relationship between the two variables.
10-61. = F (1,k) . Thus, F(1,20) = [b1/s(b1)]2 = (2.556/4.122)2 = 0.3845
Do not reject H0. There is no evidence of a linear relationship.
10-62 = [b1/s(b1)]2 =
[using Equations (10-10) and (10-15) for b1 and s(b1), respectively]
= =
= = = = F (1,k)
[because = SSR by Equations (10-31) and (10-10)]
10-63.a.Heteroscedasticity.
b.No apparent inadequacy.
c.Data display curvature, not a straight-line relationship.
10-64.a.No apparent inadequacy.
b.A pattern of increase with time.
10-65.a.No serious inadequacy.
b.Yes. A deviation from the normal-distribution assumption is apparent.
10-66.
Residual Analysis / Durbin-Watson statisticd / 1.46457
Residual variance is decreasing; residuals are not normally distributed.
10-67.Residuals plotted against the independent variable of Problem 10-14:
No apparent inadequacy.
Residual Analysis / Durbin-Watson statisticd / 2.0846
10-68.
Residual Analysis / Durbin-Watson statisticd / 1.70855
Plot shows some curvature.
10-69.In the American Express example, give a 95% prediction interval for x = 5,000:
= 274.85 + 1.2553(5,000) = 6,551.35.
P.I. = 6,551.35 (2.069)(318.16)
= [5,854.4, 7,248.3]
10-70.95% C.I. for E(Y | x = 5,000) is:
C.I. = 6,551.35 (2.069)(318.16)
= [6,322.3, 6,780.4]
10-71.For 99% P.I.:t .005(23) = 2.807
6,551.35 (2.807)(318.16)
= [5,605.75, 7,496.95]
10-72.Point prediction:
The 99% P.I.: [-55.995, 553.643]
Prediction Interval for Y / X / (1-) P.I. for Y given X
99% / 4 / 248.824 / + or - / 304.819
10-73.The 99% P.I.: [-46.611, 585.783]
Prediction Interval for Y / X / (1-) P.I. for Y given X
99% / 5 / 269.586 / + or - / 316.197
10-74.The 95% P.I.: [-142633, 430633]
Prediction Interval for Y / X / (1-) P.I. for Y given X
95% / 1990 / 144000 / + or - / 286633
10-75.The 95% P.I.: [-157990, 477990]
Prediction Interval for Y / X / (1-) P.I. for Y given X
95% / 2000 / 160000 / + or - / 317990
10-76.Point prediction:
10-77.
a)simple regression equation: Y = 2.779337 X – 0.284157
when X = 10, Y = 27.5092
Intercept / Slopeb0 / b1
-0.284157 / 2.779337
b)forcing through the origin: regression equation: Y = 2.741537 X.
Intercept / Slopeb0 / b1
0 / 2.741537
When X = 10, Y = 27.41537
PredictionX / Y
10 / 27.41537
c)forcing through (5, 13): regression equation: Y = 2.825566 X – 1.12783
Intercept / Slope / Predictionb0 / b1 / X / Y
-1.12783 / 2.825566 / 5 / 13
When X = 10, Y = 27.12783
PredictionX / Y
10 / 27.12783
d)slope 2: regression equation: Y = 2 X + 4.236
Intercept / Slopeb0 / b1
4.236 / 2
When X = 10, Y = 24.236
10-78.Those two points would define the linear regression, and the data would be measured as deviations about that line. SSE would not be minimized.
10-79.Portfolio mean:
E(α1X1 + α2X2 + α3X3) = α1E(X1) + α2E(X2) + α3E(X3) = (1/3)6 + (1/3)9 + (1/3)11 = 8.67%
Standard deviation:
SD = 2.4944%
(template: Linear Composite of RVs.xls)
Mean and Variance of a Linear Composite of Independent Random Variables
X1 / X2 / X3 / X4 / X5 / X6 / X7 / X8 / X9 / X10 / Stats for the Linear CombinationMean / 6 / 9 / 11 / 8.66667 / Mean
/ 0.3333 / 0.3333 / 0.3333 / 1 / Total Proportion
Std Devn. / 4 / 2 / 6 / 2.49444 / Std Devn.
Variance / 16 / 4 / 36 / 6.22222 / Variance
10-80Portfolio mean:
E(α1X1 + α2X2 + α3X3) = α1E(X1) + α2E(X2) + α3E(X3) = 0.2(3) + 0.5(5) + 0.3(16) = 7.9%
Portfolio Variance:
V(α1X1 + α2X2 + α3X3) = =,
where Cov(X1, X2) = ρSD(X1)SD(X2)
Cov(X1, X2) = 0.4(2)(4) = 3.2
Cov(X1, X3) = 0.8(2)(8) = 12.8
Cov(X2, X3) = -0.3(4)(8) = -9.6
Var = (0.2)2(2)2 + (0.5)2(4)2 +(0.3)2(8)2 + 2(0.2)(0.5)(3.2) + 2(0.2)(0.3)(12.8) + 2(0.5)(0.3)(-9.6) = 9.216
SD = 3.036
(template: Linear Composites of RVs.xls, sheet:Dependent RV’s)
Mean and Variance of a Linear Combination of Dependent Random Variables
X1 / X2 / X3Mean / 3 / 5 / 16
/ 0.2 / 0.5 / 0.3
Std Devn. / 2 / 4 / 8
Variance / 4 / 16 / 64
Correlation Matrix
X1 / X2 / X3
X1 / 1 / 0.4 / 0.8
X2 / 0.4 / 1 / -0.3
X3 / 0.8 / -0.3 / 1
Stats for the Linear Combination
Mean / 7.9
Variance / 9.216
Std. Devn. / 3.03579
10-81Portfolio mean:
α1E(X1) + α2E(X2) + α3E(X3) + α4E(X4) = 0.2(12) + 0.3(15) + 0.3(11) + 0.2(16) = 13.4%
Portfolio Variance:
Cov(X1, X2) = 0.5(5)(6) = 15.0
Cov(X1, X3) = 0.1(5)(9) = 4.5
Cov(X1, X4) = -0.5(5)(8) = -20.0
Cov(X2, X3) = 0.3(6)(9) = 16.2
Cov(X2, X4) = 0.7(6)(8) = 33.6
Cov(X3, X4) = 0.8(9)(8) = 57.6
Var = (0.2)2(5)2 + (0.3)2(6)2 +(0.3)2(9)2 +(0.2)2(8)2 + 2(0.2)(0.3)(15) + 2(0.2)(0.3)(4.5) + 2(0.2)(0.2)(-20)
+2(0.3)(0.3)(16.2) + 2(0.3)(0.2)(33.6) + 2(0.3)(0.2)(57.6) = 28.69
SD = 5.3563
(template: Linear Composites of RVs.xls, sheet:Dependent RV’s)
Mean and Variance of a Linear Combination of Dependent Random Variables
X1 / X2 / X3 / X4Mean / 12 / 15 / 11 / 16
/ 0.2 / 0.3 / 0.3 / 0.2
Std Devn. / 5 / 6 / 9 / 8
Variance / 25 / 36 / 81 / 64
Correlation Matrix
X1 / X2 / X3 / X4
X1 / 1 / 0.5 / 0.1 / -0.5
X2 / 0.5 / 1 / 0.3 / 0.7
X3 / 0.1 / 0.3 / 1 / 0.8
X4 / -0.5 / 0.7 / 0.8 / 1
Stats for the Linear Combination
Mean / 13.4
Variance / 28.69
Std. Devn. / 5.3563
10-82Portfolio mean:
α1E(X1) + α2E(X2) = 0.3333(4) + 0.6667(10) = 8%
Variance:
V(α1X1 + α2X2) =
= (0.3333)2(1)2 + (0.6667)2(3)2
= 4.1115
SD = 2.028
Linear Composite of Independent Random VariablesCoef. / Mean / Variance / Std Devn.
0.333 / X1 / 4 / 1 / 1
0.667 / X2 / 10 / 9 / 3
X3
Composite / 8 / 4.111111 / 2.027588
Mean / Variance / Std Devn.
10-83
Simple RegressionNumber / %GM
X / Y / Error
1 / 6 / 50.2 / -1.0673
2 / 7.8 / 50.4 / 2.44011
3 / 7.3 / 44 / -4.87862
4 / 10.3 / 49.9 / 6.53373
5 / 10.1 / 39.5 / -4.23376
6 / 10.8 / 43.1 / 0.65245
7 / 11.5 / 44 / 2.83867
8 / 15.4 / 40.1 / 6.10471
9 / 13.5 / 36 / -1.48644
10 / 15.5 / 31.7 / -2.11154
11 / 17.4 / 28.6 / -1.72039
12 / 17.1 / 27.8 / -3.07162
Percentage of GM cars
r2 / 0.7809 / Coefficient of Determination
Confidence Interval for Slope / r / -0.8837 / Coefficient of Correlation
/ (1-) C.I. for 1
95% / -1.83745 / + or - / 0.6858 / s(b1) / 0.30779 / Standard Error of Slope
Confidence Interval for Intercept
/ (1-) C.I. for 0
95% / 62.292 / + or - / 8.54265 / s(b0) / 3.83398 / Standard Error of Intercept
s / 3.95376 / Standard Error of prediction
ANOVA Table
Source / SS / df / MS / F / Fcritical / p-value
Regn. / 557.107 / 1 / 557.107 / 35.6383 / 4.96459 / 0.0001
Error / 156.322 / 10 / 15.6322
Total / 713.429 / 11
As the number of cars sold increases, the percentage of GM cars falls. Regression is significant(r2, F, p-value); there is a negative correlation (r) between number of cars sold and percentage of GM cars.
10-84
There is a strong positive correlation (r) between fruits and vegetables. The linear regression is significant (r2, F, p-value).
10-85
Employees / RevenuesX / Y / Error
1 / 96400 / 17440 / -1169.37
2 / 63000 / 13724 / 1318.72
3 / 70600 / 13303 / -513.985
4 / 39100 / 9510 / 1544.18
5 / 37680 / 8870 / 1167.94
6 / 31700 / 6846 / 254.737
7 / 32847 / 5937 / -867.32
8 / 12867 / 2445 / -648.011
9 / 11475 / 2254 / -580.445
10 / 6000 / 1311 / -506.458
Airline Revenues vs No. Employees
r2 / 0.9665 / Coefficient of Determination
Confidence Interval for Slope / r / 0.9831 / Coefficient of Correlation
/ (1-) C.I. for 1
95% / 0.18575 / + or - / 0.0282 / s(b1) / 0.01223 / Standard Error of Slope
Confidence Interval for Intercept
/ (1-) C.I. for 0
95% / 702.95 / + or - / 1370.53 / s(b0) / 594.329 / Standard Error of Intercept
Prediction Interval for Y
/ X / (1-) P.I. for Y given X
95% / 0 / 702.95 / + or - / 2797.73 / s / 1057.69 / Standard Error of prediction
Prediction Interval for E[Y|X]
/ X / (1-) P.I. for E[Y | X]
+ or -
ANOVA Table
Source / SS / df / MS / F / Fcritical / p-value
Regn. / 2.6E+08 / 1 / 2.6E+08 / 230.643 / 5.31764 / 0.0000
Error / 8949679 / 8 / 1118710
Total / 2.7E+08 / 9
Scatter Plot, Regression Line and Regression Equation
There is a strong positive correlation (r) between revenues and number of employees. The linear regression is significant (r2, F, p-value).
10-86(From Minitab)
The regression equation is
Stock Close = 67.6 + 0.407 Oper Income
Predictor / Coef / Stdev / t-ratio / pConstant / 67.62 / 12.32 / 5.49 / 0.000
Oper Inc / 0.40725 / 0.03579 / 11.38 / 0.000
s = 9.633R-sq = 89.0%R-sq(adj) = 88.3%
Analysis of Variance
SOURCE / DF / SS / MS / F / pRegression / 1 / 12016 / 12016 / 129.49 / 0.000
Error / 16 / 1485 / 93
Total / 17 / 13500
Stock close based on an operating income of $305M is = $56.24.
(Minitab results for Log Y)
The regression equation is
Log_Stock Close = 2.32 + 0.00552 Oper Inc
Predictor / Coef / Stdev / t-ratio / pConstant / 2.3153 / 0.1077 / 21.50 / 0.000
Oper Inc / 0.0055201 / 0.0003129 / 17.64 / 0.000
s = 0.08422R-sq = 95.1%R-sq(adj) = 94.8%
Analysis of Variance
SOURCE / DF / SS / MS / F / pRegression / 1 / 2.2077 / 2.2077 / 311.25 / 0.000
Error / 16 / 0.1135 / 0.0071
Total / 17 / 2.3212
Unusual Observations
Obs. / x / y / Fit / Stdev.Fit / Residual / St.Resid1 / 240 / 3.8067 / 3.6401 / 0.0366 / 0.1666 / 2.20R
R denotes an obs. with a large st. resid.
Stock close based on an operating income of $305M is = $54.80
The regression using the Log of monthly stock closings is a better fit. Operating Income explains over 95% of the variation in the log of monthly stock closings versus 89% for non-transformed Y.
10-87
Revenues / ProfitsX / Y / Error
1 / 17440 / -1221 / 225.182
2 / 13724 / -2808 / -1841.6
3 / 13303 / -773 / 138.704
4 / 9510 / 248 / 666.958
5 / 8870 / 38 / 373.816
6 / 6846 / 1461 / 1533.88
7 / 5937 / 442 / 396.792
8 / 2445 / 14 / -484.851
9 / 2254 / 57 / -466.664
10 / 1311 / 108 / -538.168
Profits vs Airline Revenues
r2 / 0.3808 / Coefficient of Determination
Confidence Interval for Slope / r / -0.6171 / Coefficient of Correlation
/ (1-) C.I. for 1
95% / -0.12967 / + or - / 0.13482 / s(b1) / 0.05846 / Standard Error of Slope
Confidence Interval for Intercept
/ (1-) C.I. for 0
95% / 815.194 / + or - / 1302.56 / s(b0) / 564.856 / Standard Error of Intercept
Prediction Interval for Y
/ X / (1-) P.I. for Y given X
+ or - / s / 955.252 / Standard Error of prediction
Prediction Interval for E[Y|X]
/ X / (1-) P.I. for E[Y | X]
+ or -
ANOVA Table
Source / SS / df / MS / F / Fcritical / p-value
Regn. / 4488686 / 1 / 4488686 / 4.91907 / 5.31764 / 0.0574
Error / 7300055 / 8 / 912507
Total / 1.2E+07 / 9
Scatter Plot, Regression Line and Regression Equation
There is a negative correlation (r) between revenues and profits. The regression is not significant (r2, F, p-value) at the 0.05 level of significance.
10-88
a)adding 2 to all X values: new regression: Y = 5 X + 17
since the intercept is , the only thing that has changed is that the value for X-bar has increased by 2. Therefore, take the change in X-bar times the slope and add it to the original regression intercept.
b)adding 2 to all Y values: new regression: Y = 5X + 9
using the formula for the intercept, only the value for Y-bar changes by 2. Therefore, the intercept changes by 2
c)multiplying all X values by 2: new regression: Y = 2.5 X + 7
d)multiplying all Y Values by 2: new regression: Y = 10 X + 7
10-89You are minimizing the squared deviations from the former x-values instead of the former y-values.
10-90
a)Y = 3.820133 X + 52.273036
Intercept / Slopeb0 / b1
52.273036 / 3.820133
b)90% CI for slope: [3.36703, 4.27323]
Confidence Interval for Slope / (1-) C.I. for 1
90% / 3.82013 / + or - / 0.4531
c)r2 = 0.9449, very high; F = 222.931 (p-value = 0.000): both indicate that X affects Y
d)since the 99% CI does not contain the value 0, the slope is not 0
Confidence Interval for Slope / (1-) C.I. for 1
99% / 3.82013 / + or - / 0.77071
e)Y = 90.47436 when X = 10
PredictionX / Y
10 / 90.47436
f)X = 12.49354
g)residuals appear to be random
Residual Analysis / Durbin-Watson statisticd / 2.56884
h)appears to be a little flatter than normal
1