3.2 Least-Squares Regression(LSRL)– Best Fit Lines
Regression – interested in PREDICTing the value of response variable.
Correlation – interested in extent to which LINEAR ASSOCIATION (no explanatory and response variables)
Best-fit line – used for predictions, minimizes the sum of the squares of the vertical deviations (DO NOT extend to y-axis)
Vertical deviations
Best-fit line: ŷ = a + bxislope (b) is how much y changes when x increases by 1 unit
To calculate: STAT, CALC, Linreg (a + bx) or Linreg(ax + b)
Example #1:
Mr. Jones is trying to determine whether studying leads to math success. His data:
X (hrs studied):5435123421
Y (test score):70808580959080609590
- Draw a scatterplot
- Calc the best-fit line, plot it in the graph.
- What is the slope of the line? Interpret its meaning. (there is an average change of b units in y for every one unit increase/decrease in x)
- What is the y-intercept? Interpret.
- Predict the grade for someone who studies 3 hours.
- Predict the hours studied for some one with a 70.
- Predict the grade if someone studies 12 hours.
Extrapolation- predicting outside the range (BE CAREFUL! Ex height vs age of preteens)
Interpolation – Good – estimating predictions between known values.
Regression line: ŷ = a + bxi contains ( x̅, )
Slope: y intercept:
Goes through (x̅, )
Example: Is there a relationship between height and SAT –M score? A group of students was surveyed. Their avg. height is 65 inches with a standard deviation of 5. Their avg. SAT-M score is 500 with a standard deviation of 50. If correlation is 0.600:
1)find the equation of the best-fit line.
2)Find r2 and explain its meaning.
3)Predict the SAT – M for a 55 inches tall student.
Example: If the best-fit line for predicting weight from height is ŷ= 5x – 150, find the correlation if x̅= 50, sx = 10, = 100, sy= 30.
Minitab output
PredictorCoeff.ST. Dev.TP
Constant99.75 IGNORE FOR NOW
Hours-5.75
ŷ = -5.75xi + 99.75
Residuals - the vertical deviations; observed y – predicted y(yi) – (ŷ) prediction error – pos means residual above reg. line
Neg means residual below reg. line
Calculate for the studying data:
x:5435123421
y:70808580959080609590
ŷ:
ŷ:
residuals:
Residual plots- compares explanatory variable to residuals (L1, L4) (scatterplot)
Residual plots magnify the deviations from the line
Look for patterns
1)good fit2) non linear
3)less accurate predictor
for larger x’s
Homework p 212 34, 35 p 221 39, 40, 42
r2 – coefficient of the determination
“percent of the variation of y explained by the linear regression of y on x”
r = measure of strength of linear relationship
r2 = how much better linear model is at predicting than y hat.
Example:
Height | 606468726365
Weight | 100130150200100220
Do:
- Scatterplot
- Regression line (calc and sketch)
- r
- r2 – explain
- find residuals
- residual plot
- predict weight for height of 24 inches
- predict height for weight of 250 lbs.
Homework p 227 43 - 47