Simple Linear Regression
1 Relationship Between Eighth Grade IQ and Ninth grade Math Score For a statistics class project, students examined the relationship between x = 8th grade IQ and y = 9th grade math scores for 20 students. The data are displayed below.
Student / Math Score / IQ / Abstract Reas1 / 33 / 95 / 28
2 / 31 / 100 / 24
3 / 35 / 100 / 29
4 / 38 / 102 / 30
5 / 41 / 103 / 33
6 / 37 / 105 / 32
7 / 37 / 106 / 34
8 / 39 / 106 / 36
9 / 43 / 106 / 38
10 / 40 / 109 / 39
11 / 41 / 110 / 40
12 / 44 / 110 / 43
13 / 40 / 111 / 41
14 / 45 / 112 / 42
15 / 48 / 112 / 46
16 / 45 / 114 / 44
17 / 31 / 114 / 41
18 / 47 / 115 / 47
19 / 43 / 117 / 42
20 / 48 / 118 / 49
Open the dataset IQ found in the Datasets folder in ANGEL. Perform a linear regression with the Response (dependent variable) math score and the variable IQ as the Predictor (independent variable). Store/save the Residuals and Fitted values. These will be stored in the fourth and fifth columns of the data worksheet. The output (shown here in Minitab) should look as follows:
Regression Analysis: Math Score versus IQ
The regression equation is
Math Score = - 21.0 + 0.567 IQ
Predictor Coef SE Coef T P
Constant -21.04 16.00 -1.32 0.205
IQ 0.5666 0.1475 3.84 0.001
S = 3.98537 R-Sq = 45.0% R-Sq(adj) = 42.0%
Analysis of Variance
Source DF SS MS F P
Regression 1 234.30 234.30 14.75 0.001
Residual Error 18 285.90 15.88
Total 19 520.20
a. Explain this equation. Discuss slope as change in Y per unit change in X in context of the variables used in this problem
b. Create a scatter plot of the measurements by selecting IQ as the predictor (x-variable) and math score as the response (y-variable). Describe the relationship between math score and IQ.
c. One of the students with a high IQ (number 17) appears to be an outlier. With a sample size of only 20 this can affect our normality assumption. Also, the constant variance assumption could be compromised. We can visual check for constant variance using a Residual Plot and test for normality using a Probability Plo (or Q-Q plot)t. To get a residual plot, simply create a Scatterplot using the Residuals as the y-variable and the Fitted Values as the x-variable. Now create a probability plot (Q-Q plot if using SPSS) of the residuals. In Minitab, we are provided the results of a test of the null hypothesis that the data follows a normal distribution. Based on these two graphs and what you have learned about hypothesis testing, what interpretations do you come to regarding the assumptions of constant variance and normality?
d. The least squares regression line for predicting math score from IQ is given in the above output. What is the fitted regression line (i.e. regression equation)?
e. What do the values in the FITS and RES columns represent?
f. Based on the output, what is the test of the slope for this regression equation? That is, provide the null and alternative hypotheses, the test statistic, p-value of the test, and state your decision and conclusion.
2Although outliers should never be deleted without a reason, there are several reasons why it may be legitimate to conduct an analysis without them. Delete the data point for row 17 (click on the cell with the IQ of 114, enter * and then click on any other cell - this “enters” the asterisk in that previous cell.) and re-calculate the regression line for the remainder of the data (see above to recall how to get regression equation). You should obtain the following output:
(Student 17 deleted)
Regression Analysis: Math Score versus IQ
The regression equation is
Math Score = - 32.2 + 0.676 IQ
19 cases used, 1 cases contain missing values
Predictor Coef SE Coef T P
Constant -32.18 10.51 -3.06 0.007
IQ 0.67601 0.09718 6.96 0.000
S = 2.56190 R-Sq = 74.0% R-Sq(adj) = 72.5%
Analysis of Variance
Source DF SS MS F P
Regression 1 317.58 317.58 48.39 0.000
Residual Error 17 111.58 6.56
Total 18 429.16
a. Use the regression line with the Student 17 deleted to estimate the math score for an individual who has an eighth grade IQ o114.. Do you think this estimate could be achieved by anybody?
b. What does the value of R2 represent (just use the latest output)? (Explain it using the variables from this data).
c. What is the correlation between Math Score and IQ for both the data sets, including and excluding the outlier?
d. Use software to find the correlation between Math Score and IQ (you can pick whether do include the outlier or not) Does this correlation value agree with the value you found in part c?
e. How does the fit of the regression line of the original data (i.e. with outlier) compare (visually and statistically) to the fit of the regression line to the data with the outlier removed? Compare the fit of the regression line between the two sets of data. Pay particular attention to the differences in R2, the slope and how the line fits each set of data. You may want to repeat the residual plot and probability plot!
f. Facts about correlation. Answer the following questions about correlation (r).
a) What is the strongest the correlation can ever be? _____
b) If there is no relationship, r is equal to ______.
c) The correlation coefficient ranges from ______to ______.
d) If the points fall in an almost perfect, negative linear pattern, r is close to: _____
e) If the points fall in an almost perfect, positive linear pattern, r is close to: _____
1