Soci209 Module 9 - Functional Form & Partial Regression Plots

Module 8 - GENERAL LINEAR TESTS

1. THE NEED FOR MORE GENERAL TESTS

· Given a linear model (omitting the i subscript) Y = b0 + b1X1 + b2X2 + b3X3 + e we can estimate the coeffs bk and easily test hypotheses of the form H0: bk = 0 H1: bk > 0 for each coeff by looking on the regr printout at the p-value of the t-ratio tk* = bk/s{bk} which is distributed as Student t w/ df=n-p where p is the total number of IVs (including the constant term). There are several common situations in which we want to test whether several regression coeffs are simultaneously equal to zero.

· For example, X2 and X3 might represent: (1) two indicators representing a qualitative variable with 3 categories (2) the first order (X) and second order (X2) polynomial terms in a variable X (3) any two variables that are individually nonsignificant; before dropping both X2 and X3 from the model we want to make sure that they are jointly non-significant

· In all these situations we want to test hypotheses of the form:

H0: b2 = b3 = 0 H1: not both b2 and b3 = 0

2. FULL & REDUCED MODELS

1. Full & Reduced Models

· The general approach to testing simultaneous hypotheses on the coefficients is to contrast a full model with a reduced (aka restricted) model.

· Corresponding to the above hypothesis, we contrast:

Y = b0 + b1X1 + b2X2 + b3X3 (full model, F) Y = b0 + b1X1 (reduced model, R)

· We compare the SSE of the full and reduced models, denoted SSE(F) and SSE(R)

· It is always true that SSE(F) <= SSE(R), because a model with more parameters always fit the data as well or better.

· If SSE(F) is not much less than SSE(R), it suggests that the extra variables in F do not help reduce SSE much. If SSE(F) is much less than SSE(R), it suggests that the extra variables in F improve the fit a lot. Thus the test is based on the difference between SSE(R) and SSE(F). The test statistic is

F* = [ (SSE(R) - SSE(F)) / (df(R) - df(F)) ] / [ (SSE(F)/df(F)) ] .

· In other words, F* is the ratio of the difference in SSE between reduced and full models divided by the difference in degrees of freedom between R and F, to the SSE of the full model divided by the degrees of freedom of F. From the ANOVA table we know that the df of SSE are n-p where p is the total number of variables including the constant.

· For the example above, df(F) = n-4, df(R) = n-2 so F* = (SSE(R) - SSE(F))/((n-2) - (n-4)) / (SSE(F)/(n-4)) or F* = (SSE(R) - SSE(F))/(2) / (SSE(F)/(n-4))

· So the df of the difference between SSE(R) and SSE(F) is equal to the number of parameters set to zero by the hypothesis.

· Small F* suggests that H0 holds. Large F* suggests that H1 holds. F* is distributed as F(df(R) - df(F), df(F)) so the decision rule is

if F* <= F(1-a; df(R) - df(F), df(F)) conclude H0

if F* F(1-a; df(R) - df(F), df(F)) conclude H1

· The strategy for general linear tests is therefore: (1) fit full model and obtain SSE(F) (2) fit reduced model under H0 and obtain SSE(R) (3) use test statistic and decision rule

2. Alternative Computation of F* in terms of R2

· We know that R2 = SSR/SSTO = 1 - (SSE/SSTO), and that SSTO is the same in the full & reduced models. From this one can derive an equivalent formula for F* in terms of R2(F) and R2(R), the coeffficients of determination of the full and reduced models, respectively:

F* = (R2(F) - R2(R))/(df(R) - df(F)) / ((1 - R2(F))/df(F)) (NKNW 7.19 p. 271)

· This formula is particularly useful to test hypotheses from published regression results. (Also, this is why one should not present the adjusted R2 alone in published reports, because it makes it more difficult for readers to recover F* if they wish.)

3. Extra Sums of Squares

· In comparing a full model including X1, X2, and X3 with a reduced model including X1 only, the extra sum of squares of the full model, compared to the reduced model, is denoted SSR(X2, X3 | X1) and defined as SSR(X2, X3 | X1) = SSE(X1) - SSE(X1, X2, X3)

· The extra sum of squares SSR(X2, X3 | X1) is thus the reduction in SSE achieved by including X2 and X3 in a model that already contains X1.

3. EXAMPLES OF TESTING JOINT HYPOTHESES

1. Body Fat Example (NKNW pp. 260-263) : In the model for body fat (Y) containing X1, X2, and X3 one wants to test the joint significance of X2 & X3.

The hypothesis setup is:
H0: b2 = b3 = 0 H1: not both b2 and b3 = 0
From the full and reduced models one obtains
Using formula (2.70) with SSE one gets / Full / Reduced
SSE / 98.404888 / 143.119703
R2 / 0.801359 / 0.711097
df of SSE=(n-p) / 16 / 18

F* = (143.119703 - 98.404888)/(18 - 16) / ((98.404888)/16) = 3.635

· Equivalently, one could use formula (7.19) with R2 to get

F* = (0.801359 - 0.711097)/(18 - 16) / ((1 - 0.801359)/16) = 3.635

· We choose a = .05. To apply the decision rule we need the critical value F(0.95; 2, 16) = 3.633723 Since F* = 3.635 > 3.633723 (but just barely!), conclude H1: b2 and b3 are not both zero. Alternatively, we can directly find the (p-value of F* = 3.635) = 0.049956 Again F* is barely significant at the .05 level so we conclude H1.

· Note that while in the full model each coefficient is individually nonsignificant they are jointly significant (although barely).

2. Testing a Polynomial Function

· We want to test the joint significance of the second-degree polynomial of energy consumption per capita. We do this by comparing Model 8 (F) with Model 6 (R). From the table we have R2(F) = .818; df(F) = 56-9 = 47; R2 (R) = .807; df(R) = 49. Thus

F* = ((.818 - .807)/2) / ((1 - .818)/47) = 1.4203

· We find the p-value of F*=0.251822. Conclude the polynomial coeffs are jointly nonsignificant.

4. TESTS ON REGRESSION COEFFICIENTS USING FULL VS. REDUCED MODEL

1. Test Whether a Single bk = 0 : [H0: bk = 0 H1: bk < 0]

This is the usual test reported as the p-value of tk* = bk/s{bk} on the regr printout. One can show that the corresponding F* from the full vs. reduced model comparison is equal to the square of tk*, i.e. F*=(tk*)2. Thus the t-test & F-test for a single coeff are equivalent.

2. Test Whether All bk = 0 : [H0: b1 = b2 = ... = bp-1 = 0 H1: not all bk (k = 1, ..., p-1) = 0]

This is the usual test reported as the p-value of F* = MSR/MSE on the regression printout. It follows as a special case of the general formula in which the full model has SSE(X1, X2, ..., Xp-1) with df=n-p and the reduced model has SSE = SSTO with df=n-1.

3. Test Whether Some bk = 0 : [H0: bq = bq+1 = ... = bp-1 = 0 H1: not all of the bk in H0 = 0 ]

(The notation assumes that the variables are arranged so that the tested variables have subscripts q to p-1.) This is the situation discussed above. Various other tests can be carried out as a comparison of full & reduced model, using "tricks".

4. Test Equality of 2 Coefficients : [H0: b1 = b2 H1: b1 < b2]

The full model is (omitting the i subscript) : Y = b0 + b1X1 + b2X2 + b3X3 + e

The trick is to define the reduced model as : Y = b0 + bc (X1 + X2) + b3X3 + e

where bc is the "common" regression coefficient of X1 and X2 under H0.

One estimates the reduced model as the regression of Y on a new variable calculated as the sum of X1 and X2. Then one calculates F* using formulas. The full model has df=n-4 and the reduced model has df=n-3 so F* has df=(1, n-4).

5. Test Whether Some bk Have Specific Values Other than 0

H0: b1 = 3, b3 = 5 H1: not both equalities in H0 hold

With the full model as above, one derives the reduced model by replacing b1 and b3 by their assumed values under H0 and removing their effects from the dependent variable, as

W = Y - 3X1 - 5X3 = b0 + b2X2 + e where W is the new dependent variable. The reduced model is estimated as the regr of W on X2. Then one calculates F* which has df=(2, n-4).

6. MATRIX FORMULATION OF GENERAL LINEAR TEST (OPTIONAL)

· The null hypothesis H0 for any linear hypothesis can be represented by specifying a matrix A and a vector d. Then the null hypothesis H0 is represented as H0: Ab=d where A is sxp, b is px1, & d is sx1; s is the number of constraints on the coeffs.

· Various specifications of A and d are shown in the following examples, based on a full model with a constant term and variables X1, X2, and X3.

· EX: H0: b1 = 0 A = [0 1 0 0] d = [0]
· EX: H0: b1 = b2 A = [0 1 -1 0] d = [0] / · EX: H0: b1 = b2 = 0
· d' = [0 0] (Ab=d)
· A = / 0 / 1 / 0 / 0 / b0 / = / 0
0 / 0 / 1 / 0 / b1 / 0
b2 / 0

Module 9 - FUNCTIONAL FORM & PARTIAL REGRESSION PLOTS

1. USES OF PARTIAL REGRESSION PLOTS

· Various diagnostic tools and remedial measures are available to address possible violations of the classical assumptions of multiple regr analysis. The most common tool is the residual plot in which the residuals ei are plotted against the estimates (predictors) ^yi.

· From the appearance of the plot we may be able to diagnose a variety of problems with the model (such as heteroskedasticity, nonlinearity, outlying observations, etc.) in the same way as for simple regression models ( see Module 3).

· Partial regression plots are another diagnostic tool that permits evaluation of the role of individual variables within the multiple regression model. They are used to assess visually

1. whether a variable should be included or not in the model

2. the presence of outliers & influential cases that affect the coeffs of individual X vars in the model

3. the possibility of a nonlinear relationship between Y & individual X vars in the model

· A partial regression plot is a way to look at the marginal role of a variable Xk in the model, given that the other independent variables are already in the model.

2. CONSTRUCTION OF A PARTIAL REGRESSION PLOT

· Assume the multiple regr model (omitting the i subscript) Y = b0 + b1X1 + b2X2 + b3X3 + e

· There is a regression plot for each one of the X variables. To draw the partial regression plot of Y on X1 "the hard way", for example, one proceeds as follows:

1. Regress Y on X2 and X3 and a constant, and calculate the predictors and residuals

^Yi(X2, X3) = b0 + b2Xi2 + b3Xi3 ei(Y|X2, X3) = Yi - ^Yi(X2, X3)

2. Regress X1 on X2 and X3 and a constant, and calculate the predictors and residuals

^Xi1(X2, X3) = b0+ + b2+Xi2 + b3+Xi3 ei(X1|X2, X3) = Xi1 - ^Xi1(X2, X3)

3. The partial regression plot for X1 is the plot of ei(Y|X2, X3) against ei(X1|X2, X3)

· In practice, statistical programs such as SYSTAT have options to save the partial residuals ei(Y|X2, X3) and ei(X1|X2, X3) when estimating the regression model, so one does not need to do these auxilliary regressions separately.

3. INTERPRETATION OF A PARTIAL REGRESSION PLOT & AN EXAMPLE

· It can be shown that the slope of the partial regression of ei(Y|X2, X3) on ei(X1|X2, X3) is equal to the estimated regression coefficient b1 of X1 in the multiple regr model Y = b0 + b1X1 + b2X2 + b3X3 + e . Thus the partial regr plot allows us to isolate the role of the specific IV in the multiple regr model. In practice one scrutinizes the plot for patterns

· The patterns mean

1. pattern a (line strip around 0), which shows no apparent relationship, suggests that X1 does not add to the explanatory power of the model, when X2 & X3 are already included

2. pattern b (b1 sloped line strip) suggests that a linear relationship between Y and X1 exists, when X2 and X3 are already present in the model. The slope of the partial regression line is the same as the coefficient of X1 in the multiple regression model

3. pattern c (curved line strip) suggests that the partial relationship of Y with X1 is curvilinear; one may try to model this curvilinearity with a transformation of X1 or with a polynomial function of X1

4. the plot may also reveals observations that are outliers with respect to the partial relationship of Y with X1

· As an example we look at the Graduation Rates file used in a previous assignment. The dependent variable is GRAD, the state rate of graduation from high school. We estimate the model GRAD = CONSTANT + INC + PBLA + PHIS + EDEXP + URB.

Exhibit: Partial regression plot for INC & for PBLA & for PHIS

· As an extra refinement these partial regression plots use the INFLUENCE option of SYSTAT. In an influence plot the size of the symbol is proportional to the amount that the Pearson correlation between Y and X would change if that point were deleted. Large symbols therefore correspond to observations that are influential. The plots allow us to identify cases that are problematic with respect to specific IVs in the model. For example, two observations stand out: DC in the plot for PBLA, and NM in the plot for PHIS. The style of the symbol indicates the direction of the influence:

1. an open symbol indicates an observation that decreases (i.e., whose removal would increase) the magnitude (absolute value) of the correlation; for example, removing NM (case 32) in the partial regression plot for PHIS would increase the magnitude of the correlation between YPARTIAL(3) and XPARTIAL(3) from -.461 to -.558.

2. a filled symbol indicates an observation that increases (i.e., whose removal would decrease) the magnitude (absolute value) of the correlation; for example, removing DC (case 9) in the partial regression plot for PBLA would decrease the magnitude of the correlation between YPARTIAL(2) and XPARTIAL(2) from -.703 to -.641.

· SYSTAT calculates the influence of an observation very simply as the absolute value of the difference in the correlation coefficient of Y and X with and without that observation.