Topics for Today
Prediction Error
When not to use Prediction
Midterm prep
Prediction Error
Although our regression equation allow us to make a ______for an individual’s value of Y, we are, of course, not exactly ______!
Let’s look back to one of the two examples in which we predicted values of Y from a value of X.
For a state with a Poverty Rate of 10, the Teen Pregnancy Rate for that state is ______to be:
(Teen Pregnancy) = 1.116*(10) + 28.335
= 11.16 + 28.335
= 39.495
on the scatterplot this corresponds to:
So, this and all other predicted values all ______the regression line.
Our predictions are approximately correct, but not always very accurate.
We can get a sense of how ______our prediction is, by considering how _____ the line is to all the points.
For example, consider the predicted teen pregnancy rate for a state with a poverty rate of 14.5.
(Teen Pregnancy) = ______
= ______
= _____
State G had a poverty rate of 14.5 and a teen pregnancy rate of 44.8. Identify it on the figure. Now, identify the predicted value for this state.
The actual and predicted values of Teen Pregnancy rate for State G are ______!
The difference between these two values is called the _____ or ______(though the more common name is ______).
Every individual in the dataset has a residual – imagine the ______from each point to the line.
Clearly, the more ______points are to the line, the ______these residuals will be … and the ______we would expect our ______to be to the true values.
Which of these datasets would result in the best prediction?
So, the ______the R2, the ______we would expect our prediction to be.
… sensible since we already defined R2 as the amount of variability in Y that is ______by X.
The textbook has a very nice mathematical explanation of this on pages 383-388.
When can’t we do prediction?
Regression analysis is an extremely useful and pervasive tool for understanding the ______between two variables and doing ______of a response variable from an explanatory variable.
… but sometimes, it may not be a good idea to do prediction using a linear regression model.
Can’t Predict: Reason #1
The relationship between X and Y ______.
Let’s go back to the TV watching versus Age model with hours of TV watched as the ______. From SPSS, we have:
So, we have the following regression model:
(Hours of TV) = 0.023(Age) + 5.149
How much TV does a 40 year old watch?
(Hours of TV) = 0.023(40) + 5.149
= ______
= _____
How good is this prediction? Let’s check the scatterplot.
So, It’s clearly not accurate, but from the scatterplot we can see that using a line to approximate the ______relationship will give us ______errors in prediction:
- people from 20 to 60 will always be ______
- people under 20 and over 60 will always be ______
If the scatterplot doesn’t show an ______relationship you ______a linear regression for prediction!
Can’t Predict: Reason #2
The X value you are using is ______of the data.
What does this mean?
This occurs most often in sports:
“Alex Edler is on pace for 82 points this season!”
… we are ______beyond the range of the data!
Let’s show another example, why can’t we ______heart disease deaths with alcohol?
The regression line goes right down to a heart attack rate of zero!
Clearly, if every person in a country drank 11 litres of alcohol per person, it would ___ eliminate heart disease.
Making a ______beyond the range of the data ASSUMES that the linear relationship persists … but we have no data to support that.
New Topics Covered Today
Prediction error
· not all predictions are accurate
· we can tell how big a prediction area to expect based on how tightly the observed points are clustered to the regression line (R2)
When not to use regression for Prediction
· when the relationship is not linear
· when we extrapolate beyond the range of our data
Reading:
No New Reading
Stat203 Page 2 of 17
Fall2011 – Week 11, Lecture 3