3.2 Least-Squares Regression(LSRL)– Best Fit Lines

Regression – interested in PREDICTing the value of response variable.

Correlation – interested in extent to which LINEAR ASSOCIATION (no explanatory and response variables)

Best-fit line – used for predictions, minimizes the sum of the squares of the vertical deviations (DO NOT extend to y-axis)

Vertical deviations

Best-fit line: ŷ = a + bxislope (b) is how much y changes when x increases by 1 unit

To calculate: STAT, CALC, Linreg (a + bx) or Linreg(ax + b)

Example #1:

Mr. Jones is trying to determine whether studying leads to math success. His data:

X (hrs studied):5435123421

Y (test score):70808580959080609590

  1. Draw a scatterplot
  2. Calc the best-fit line, plot it in the graph.
  3. What is the slope of the line? Interpret its meaning. (there is an average change of b units in y for every one unit increase/decrease in x)
  4. What is the y-intercept? Interpret.
  5. Predict the grade for someone who studies 3 hours.
  6. Predict the hours studied for some one with a 70.
  7. Predict the grade if someone studies 12 hours.

Extrapolation- predicting outside the range (BE CAREFUL! Ex height vs age of preteens)

Interpolation – Good – estimating predictions between known values.

Regression line: ŷ = a + bxi contains ( x̅, )

Slope: y intercept:

Goes through (x̅, )

Example: Is there a relationship between height and SAT –M score? A group of students was surveyed. Their avg. height is 65 inches with a standard deviation of 5. Their avg. SAT-M score is 500 with a standard deviation of 50. If correlation is 0.600:

1)find the equation of the best-fit line.

2)Find r2 and explain its meaning.

3)Predict the SAT – M for a 55 inches tall student.

Example: If the best-fit line for predicting weight from height is ŷ= 5x – 150, find the correlation if x̅= 50, sx = 10, = 100, sy= 30.

Minitab output

PredictorCoeff.ST. Dev.TP

Constant99.75 IGNORE FOR NOW

Hours-5.75

ŷ = -5.75xi + 99.75

Residuals - the vertical deviations; observed y – predicted y(yi) – (ŷ) prediction error – pos means residual above reg. line

Neg means residual below reg. line

Calculate for the studying data:

x:5435123421

y:70808580959080609590

ŷ:

ŷ:

residuals:

Residual plots- compares explanatory variable to residuals (L1, L4) (scatterplot)

Residual plots magnify the deviations from the line

Look for patterns

1)good fit2) non linear

3)less accurate predictor

for larger x’s

Homework p 212 34, 35 p 221 39, 40, 42

r2 – coefficient of the determination

“percent of the variation of y explained by the linear regression of y on x”

r = measure of strength of linear relationship

r2 = how much better linear model is at predicting than y hat.

Example:

Height | 606468726365

Weight | 100130150200100220

Do:

  1. Scatterplot
  2. Regression line (calc and sketch)
  3. r
  4. r2 – explain
  5. find residuals
  6. residual plot
  7. predict weight for height of 24 inches
  8. predict height for weight of 250 lbs.

Homework p 227 43 - 47