Topics for Today

Prediction Error

When not to use Prediction

Midterm prep


Prediction Error

Although our regression equation allow us to make a ______for an individual’s value of Y, we are, of course, not exactly ______!

Let’s look back to one of the two examples in which we predicted values of Y from a value of X.

For a state with a Poverty Rate of 10, the Teen Pregnancy Rate for that state is ______to be:

(Teen Pregnancy) = 1.116*(10) + 28.335

= 11.16 + 28.335

= 39.495

on the scatterplot this corresponds to:


So, this and all other predicted values all ______the regression line.

Our predictions are approximately correct, but not always very accurate.

We can get a sense of how ______our prediction is, by considering how _____ the line is to all the points.

For example, consider the predicted teen pregnancy rate for a state with a poverty rate of 14.5.

(Teen Pregnancy) = ______

= ______

= _____


State G had a poverty rate of 14.5 and a teen pregnancy rate of 44.8. Identify it on the figure. Now, identify the predicted value for this state.


The actual and predicted values of Teen Pregnancy rate for State G are ______!

The difference between these two values is called the _____ or ______(though the more common name is ______).

Every individual in the dataset has a residual – imagine the ______from each point to the line.

Clearly, the more ______points are to the line, the ______these residuals will be … and the ______we would expect our ______to be to the true values.


Which of these datasets would result in the best prediction?


So, the ______the R2, the ______we would expect our prediction to be.

… sensible since we already defined R2 as the amount of variability in Y that is ______by X.

The textbook has a very nice mathematical explanation of this on pages 383-388.


When can’t we do prediction?

Regression analysis is an extremely useful and pervasive tool for understanding the ______between two variables and doing ______of a response variable from an explanatory variable.

… but sometimes, it may not be a good idea to do prediction using a linear regression model.


Can’t Predict: Reason #1

The relationship between X and Y ______.

Let’s go back to the TV watching versus Age model with hours of TV watched as the ______. From SPSS, we have:

So, we have the following regression model:

(Hours of TV) = 0.023(Age) + 5.149

How much TV does a 40 year old watch?

(Hours of TV) = 0.023(40) + 5.149

= ______

= _____

How good is this prediction? Let’s check the scatterplot.


So, It’s clearly not accurate, but from the scatterplot we can see that using a line to approximate the ______relationship will give us ______errors in prediction:

-  people from 20 to 60 will always be ______

-  people under 20 and over 60 will always be ______

If the scatterplot doesn’t show an ______relationship you ______a linear regression for prediction!


Can’t Predict: Reason #2

The X value you are using is ______of the data.

What does this mean?

This occurs most often in sports:

“Alex Edler is on pace for 82 points this season!”

… we are ______beyond the range of the data!

Let’s show another example, why can’t we ______heart disease deaths with alcohol?



The regression line goes right down to a heart attack rate of zero!

Clearly, if every person in a country drank 11 litres of alcohol per person, it would ___ eliminate heart disease.

Making a ______beyond the range of the data ASSUMES that the linear relationship persists … but we have no data to support that.


New Topics Covered Today

Prediction error

·  not all predictions are accurate

·  we can tell how big a prediction area to expect based on how tightly the observed points are clustered to the regression line (R2)

When not to use regression for Prediction

·  when the relationship is not linear

·  when we extrapolate beyond the range of our data

Reading:

No New Reading

Stat203 Page 2 of 17

Fall2011 – Week 11, Lecture 3