Ap Stats 3.3 Correlation and Regression Wisdom
Limitations of correlation & regression:
1) only works for linear relationships
2) extrapolation can be unreliable
3) not resistant
Outliers & Influential Observations In Regression
outlier - lie away from the overall pattern of the observations (in y direction have large residuals); “oval rule”, not necessarily influential.
influential observations - if removing them markedly change the calculations (x outliers are often influential)
Example #1:
Height (x) : 62 64 66 70 72 68 60 67 84
Weight (y) : 125 130 140 160 180 155 105 220 270
1) scatterplot
2) regression line + graph
3) r, r2
4) possible outliers?
5) influential? redo the calculations to check
Lurking Variable - variable that’s not explanatory or response, but influences the relationships between the variables
Example #2:
# of MethodistAmt. of Cuban Rum
YearMinisters in BostonImported into Boston
1860638376
1865486406
1870537005
1875648486
1880729595
18858010643
18908511265
18957610071
19008010547
19058311008
191010513885
191514018559
Describe the relationship.
Is there a correlation between more ministries and amount of rum imported?
Homework p 238 59 p 242 63 - 65
Review: p 251 77 - 79
1) Height (x) : 60 64 68 72 63 65
Weight (y) : 100 130 150 200 100 220
a) scatterplot
b) regression line + graph
c) what is the slope and interpret it
d) what is the y-intercept and interpret it
e) r
f) r2 and explain
g) possible outliers?
h) influential? redo the calculations to check
i) residual plot
j) predict weight for height of 24 inches
k) predict height for weight of 250 lbs.
2) For the following data, use the oval rule to determine outliers. Test each outlier to determine if it is influential or not.
SAT-M:400500600650550450500550600650400750200
SAT-V:450510450700500400520450600600750750220
a. Draw the scatterplot of the regression of SAT-V on SAT-M. Interpret it. Use the oval rule to determine outliers. Test each outlier to determine if it is influential or not.
b. Create a residual plot. Use it to interpret the linear fit.
c. Interpret linear fit of SAT-V on SAT-M using r.
d. Find and interpret r2.
e. Find r for the regression of SAT-M on SAT-V.
f. Double each SAT-M score and find r for SAT-V on SAT-M.
g. After d) add 50 points to each SAT-V score and find r for SAT-M on SAT-V.
h. What can we say about r and linear transformations?
3) Is there a relationship between Auto Mechanic Aptitude test and number of hours grade school children watch TV? A group of mechanics was surveyed. Their TV hours is normally distributed with a mean of 20 with a standard deviation of 2 hours. Their average test score is 270 with a standard deviation of 35. If correlation is 0.5110:
a) Find the equation of the best-fit line.
b) Find r2 and explain its meaning.
c) Predict the test score for an auto mechanic who watched TV 37 hours.
4) If the best-fit line for predicting weight from height if ŷ = 5x -120, find the correlation if = 10, sx = 2, = 100, and sy = 30.
5)Shown below is output from Minitab:
Dependent variable is: height
R squared = 98.9%R squared (adjusted) = 98.8%
s = 0.256 with 12 – 2 degrees of freedom
VariableConstants.e. of Coefft-ratioprob
Constant64.92830.50841280.0001
Temp0.6349650.021429.70.0001
a) Find the best-fit line.
b) How many observations were used to create the output?
c) Interpret the relationship.
d) If the independent variable is “age”, find the residual for the observation with age 40 and “height” 100.