Solutions to Sample Test 2
Y / X / XY / X22 / 3 / 6 / 9
4 / 3 / 12 / 9
6 / 6 / 36 / 36
8 / 8 / 64 / 64
20 / 20 / 118 / 118
- Note the mean of X, X-bar, is 20/4 = 5 and the mean of Y, Y-bar is 20/4 = 5. The estimated slope is b and
- To determine the full regression equation we also need to calculate the intercept,
Thus, the least regression equation is Yhat= a+bX or in this case,
- To compute the standard error of the estimate using the definitional formula,
requires that we determine Yhat via the regression equation for each observation. I do this in the table below.
Y / X / Yhat = 1X / (Y-Yhat) / (Y-Yhat)22 / 3 / 3 / -1 / 1
4 / 3 / 3 / +1 / 1
6 / 6 / 6 / 0 / 0
8 / 8 / 8 / 0 / 0
2
- The regression sum of squares, SSR, is given by the following formula and is calculated in the table below(note Yhat is from the answer to question 3 and Ybar is the mean of Y or 5).
2 / 3 / 3 / (3-5) = -2 / 4
4 / 3 / 3 / (3-5) = -2 / 4
6 / 6 / 6 / (6-5) = +1 / 1
8 / 8 / 8 / (8-5) = +3 / 9
18
- The coefficient of determination is R-squared and is given by the following formula
Note from the formula above we need to compute SST, the total sum of squares. We could compute SST using its own formula but it is more efficient to recognize thatSST is the sum of SSR and SSE or SST = SSR + SSE. Since we already have SSE and SSR from prior problems, we find that
SST = 18 + 2 = 20 and the coefficient of determination is
- The test statistic for the statistical significance of X is
We know for this problem that b = 1 and B by hypothesis is 0(zero). However we need to calculate the standard error of b, Sb. The formula and computation for this statistic is
and the test statistic is
- To determine whether the number of employees, X, is statistically significant we need to compare the test statistic just computed with a critical t from the T table using an alpha of 5 percent(see instructions at top of test). If the test statistic in absolute size is greater than the critical t from the Table, we reject the null hypothesis that the true slope, B, is zero and conclude that X is statistically significant. However, if the test statistic in absolute size is smaller than the critical t from the Table, we do not reject the null hypothesis that the true slope, B, is zero and then the conclusion is that X is NOT statistically significant.
In this case the critical t is 4.303 and the test statistic is 4.24. Since the test statistic is smaller than the critical t, the decision is do not reject. Thus, X is NOT statistically significant. Note degrees of freedom for this test is 2, n-m = 4-2.
- The F statistic is defined and computed below
- To determine whether the regression model is statistically significant we need to compare the test statistic just computed in question 8 with a critical F from the F table using an alpha of 5 percent(see instructions at top of test). If the test statistic in absolute size is greater than the critical F from the Table, we reject the null hypothesis that the true slope, B, is zero and conclude that the model is statistically significant. However, if the test statistic in absolute size is smaller than the critical F from the Table, we do not reject the null hypothesis that the true slope, B, is zero and then the conclusion is that the model is NOT statistically significant.
In this case the critical F is 18.51 and the test statistic is 18. Since the test statistic is smaller than the critical F, the decision is do not reject. Thus, the model is NOT statistically significant. Note degrees of freedom for this test are 1 in the numerator(m-1=1) and 2 in the denominator(n-m = 2).
- A large variance inflation factor is indicative of a severe multicollinearity problem. Thus, if the VIF were 12.5 that would indicate severe multicollinearity. Recall that the VIF is defined as
11. If ESS or SSE is zero, the coefficient of determination, R-squared, would be 1.0. This is verified by the definition of R-squared via
- To predict 1999(current) salary merely substitute the given values for each of the independent variables, Xs, into the estimated regression equation. Thus, the calculation is
1999$ = 54991+ 920(10) – 3591(1) – 1.53(20000) = 30,000
- Based on the regression results and holding experience and beginning salary constant, females make $3591 less than there male counterparts. This conclusion is based on the fact that the partial slope with respect to the dummy variable, D, is –3591. Thus, as D increases by 1(switch from male(0) to female(1), 1999 salary drops by 3591.
- Based on the t-test all three predictors(Xs) are statistically significant at the 5 percent level. This can be demonstrated in either of two ways. First of all the P value for each variable is essentially 0 which is below the level of significance of .05. This means that each of the t statistics are well into the one of the two tails of the distribution, which leads to a reject decision for the null hypothesis. Since the null for each variable states that its partial slope is zero or that the variable does not affect 1999 salary, rejection of the null means that the variable is statistically significant.
The other way to complete these tests is to find the critical t from the T table and compare to the test statistic fro each X variable. The critical t in this case is approximately 1.98. Note n = 171(based of df for SST is 170 and it is always n-1) and m is 4 since there are 4 coefficients in our model.
- The null hypothesis for the F-test of model significance can be written as
H0: All slopes = 0
Or
H0: B1 = B2 =B3 = 0
- The critical F from the 5 percent F Table is approximately 2.68. Note numerator degrees of freedom are m-1 = 4 - 1 or 3 and denominator degrees of freedom are n-m = 171 – 4 = 167. The F value of 2.68 is approximate since Table does not cover 167 degrees of freedom in denominator. I used 3 and 120 for dfs.
- The model is statistically significant at the 5 percent level. This conclusion can be determined in either of two ways. First the test statistic, 78.12, is greater than the critical F, 2.68. Thus, reject the null hypothesis and conclude that the model is statistically significant. Second, note the P value for the F statistic is 0.000 which is smaller than the level of significance, .05. This means that our test statistic is in the rejection area of .05.
- Note if SSE is zero, R2 will be
- Note the model has 4 coefficients, an intercept and 3 partial slopes. Also the degrees of freedom for the error sum of squares(SSE) is n-m. In this case n – m = 50 – 4 = 46. The correct answer is thus 46.
- Note same model as above, however we want to determine the degrees of freedom for the regression sum of squares (SSR). Degrees of freedom for RSS is m – 1, which 4 – 1 = 3.
- The F test is always an upper tail test in regression. An upper tail test is one in which the rejection region is in the upper tail of the distribution.
- The Y variable in a regression model is referred to as all of the listed responses except “independent variable.” An independent variable is an X variable.
- Least squares is a method to fit an equation to data.
- The existence of high correlation or a strong relationship between variables in a sample is never sufficient basis to conclude that one CAUSES changes in the other. Always remember correlation does not imply causation.
- Multicollinearity is a statistical problem in regression in which there is high correlation between and among independent variables(the Xs).