Chapter 10Page 1 of 14
Chapter 10: Regression
The heights and weights from a survey of 988 men are shown in the scatter diagram below.
Here is the five-statistic summary:
Average height 70 inches / Average weight 162 poundsSD 3 inches / SD 30 pounds
0.47
Question: Suppose a man is one SD above average in height (70 + 3 = 73 inches). Should you guess his weight to be one SD above average (162 + 30 = 192 pounds)?
Answer: No. Notice that maybe 10 or 11 of the men 73 inches tall have weights above 192 pounds, while dozens have weights below 192 pounds.
What would be a better guess?
The SD line (dashed line in page 1 plot) cuts the football in half along its length. To obtain this line, you start at the point of averages, and draw a line with slope,
m SD line = SDy / SDx, if r > 0
m SD line = - SDy / SDx, if r < 0
.
On the SD line, if x increases by one SD, then y also increases by one SD.
The regression line (solid line in page 1 plot) is used for prediction. To obtain this line, you start at the point of averages, and draw a line with slope,
m regression = r ( SDy / SDx )
On the regression line, if x increases by one SD, then y increases
by r SD.
Ex.: Predict the weight of a man who is 6’4”.
Ex.: Predict the weight of a man who is 5’6”.
Regression Method
Steps
- Convert values to standard unit (z)
- zpredict = r zknown
- Convert back to values.
So, for a guy whose height is 1 SD above average (z_height = 1), we guess that,
z_weight = r z_height
= 0.47 1 = 0.47.
weight = Avr_weight + z_weight SD_weight
= 162 + 0.47 30
= 176.1 lb.
Ex.: Predict the weight of a man who is 6’4”.
Ex.: Predict the weight of a man who is 5’6”.
Ex.: Find the equation of the regression line that corresponds to the five-statistic summary given in the diagram on page 1. Afterwards, plug in to verify.
Ex.: For men aged 55 – 64 in the U.S. in 1993, the relationship between education (years of schooling completed) and personal income can be summarized as follows.
Average education 12.5 years / SD 4 yearsAverage income $30,800 / SD $26,700
Estimate the average income of those men that finished elementary school but did not go on to high school (i.e., they completed 8 years of education.)
Graph of Averages
The graph below is obtained by averaging the weights of the men of the same height.
For example, the 100 men of height 67 inches (to the nearest inch) weigh an average of 148 pounds.
Notice that these averages appear to lie along a line – in particular, the regression line.
Question: Why are the points far off of the regression line near the ends?
Ex.: A university has made a statistical analysis of the relationship between SAT-M scores and first-year GPA. The results are:
Average SAT score = 550 / SD = 80Average first-year GPA = 2.6 / SD = 0.6
The scatter diagram is football shaped. A randomly chosen student has an SAT-M score of 650. Predict her first-year GPA.
Ex.: If a student ranks at 90th percentile on her SAT-M. Predict the percentile rank of her first-year GPA.
Play it again, Sam!
A Quick Recall: For a study of 1,078 fathers and sons;
Average fathers’ height = 68 inches,SD = 2.7 inches
Average sons’ height = 69 inches,SD = 2.7 inches
0.5
Question: Suppose a father is 72 inches tall. How tall would you predict his son to be?
Wrong Answer: The father is SDs taller than average. Therefore, his son should also be 1.5 SDs taller than average, or
tall.
A father 72 inches tall with a son 73 inches tall would lie on the SD line, denoted by the dashed line in the diagram.
As seen in the diagram, the average height of the son with a 72”-tall father is NOT 73”.
Correct Answer: Since , a father 1.5 SDs above average may be predicted to have a son SDs above average, so
inches
Notice that this regression estimate is at roughly the average of the values in the 72”strip.
Ex.: Estimate the height of a son with a 64” tall father.
Regression Effect and Regression Fallacy
Notice that tall fathers tend to have tall sons – though sons who are not as tall. Likewise, short fathers on average will have short sons – just not as short. Hence the term, “regression.” An earlier statistician called this effect, the “regression toward mediocrity.” There is no biological cause to this effect – it is strictly statistical.
Ex.: A preschool program attempts to boost students’ IQs. The children are tested when they enter the program (pretest), and again when they leave the program (post-test). On both occasions, the average IQ score was 100, with an SD of 15. Also, students with below-average IQs on the pretest had scores that went up by 5 points, while students with above average scores of the pretest had their scores drop by an average of 5 points.
What is going on? Does the program equalize intelligence?
Answer: No.
- If the program equalized intelligence, the post-test SD would be less than 15 points.
- We are just seeing the regression effect.
Thinking that the regression effect is due to something important is called the regression fallacy.
Where does Regression Effect come from?
Question: An average guy can eat about 8 hotdogs before start throwing up. Paisa entered a hotdog-eating contest and ate 12 hotdogs. Does this mean Paisa can always stomach 12 hotdogs at any contest?
Answer: No. There are good days and bad days. It could be that…
- He can normally eat around 10 or 11, but today is his good day.
- He can normally eat around 12 hotdogs.
- He can normally eat around 13 or 14, but today is his bad day.
- …15, 16, 17 etc., (unlikely)
Which of the above scenarios is more likely?
Answer:
- If we know about Paisa’s performance at several other hotdog eating events, then we wouldn’t have to guess. Say, if he ate 12, 13, 11, 12, 14, 12, 11, 13. Then we just take the average.
- It’s a 50-50 chance whether it’s a good day or bad day for Paisa. So that doesn’t help us decide.
- There are many more average Joes than people with an extra large stomach. (Large or small is relative; have to look at the average.)
- There are many more people who can eat 10-11 hotdogs than people who can gulp down 12, 13, or 14 hotdogs.
- Judging from just this one contest, and knowing nothing more about Paisa, we’d better guess that Paisa is can normally eat around 10 or 11, and today just happens to be his good day.
This is the origin of the regression effect. If someone scores above average on the first test, we would estimate that the true score is probably a bit lower than the observed score. Vice versa.
Ex.: An instructor gives a midterm. She asks the students who score 20 points below average to see her regularly during her office hours for special tutoring. They all score at class average or above on the final. Can this improvement be attributed to the regression effect?
Ex.: In a study of 1,000 families,
Husbands’ average height = 68 inches SD = 2.7 inches
Wives’ average height = 63 inches SD = 2.5 inches
r = 0.25
a)Predict the husband’s height when his wife’s height is 68 inches.
b)Predict the wife’s height when her husband’s height is 69.35 inches.
c)Predict the wife’s height when her husband’s height is not known.