MODEL CHECKING:

Lack of fit and pure error:

a model we tentatively use.

the true model and might not be . Intuitively, if the tentatively used model is not the true model (), then based on the simple linear regression model can not be an accurate predicted. value of . Thus, . Then, the mean residual sum of square is no longer a sensible estimate of . To resolve this problem, we could try to obtain repeat observations with respect to the same covariate. Let

repeated observation at

repeated observation at

repeated observation at

Note that . Suppose the true model is then .

Then, we can use

to estimate , where .

The justification is as follows:

For

,

since

.

Similarly,

where

and

Thus,

,

can be regarded as a pooled estimate of . is called the pure error sum of square!!

Fundamental Equation:

,

where is the fitted value of .

is called the lack of fit sum of squares and can be used to examine if lack of fit is significant!!

The justification for the use of is as follows:

as the true model is not the simple linear regression model we tentatively use. Thus, would be large!!

The fundamental equation can be also written as

Residual sum of squares = pure error sum of squares + lack of fit sum of squares

Let . The ANOVA table is

Source / df / SS / MS
Due to regression( / 1 / /
Lack of fit / m-2 / /
Pure error / n-m / /

Total (corrected) n-1

To test : the simple regression is adequate, the F statistics can be used,

Intuitively, large F implies the difference between the true model and the tentatively used model is large relative to the variation of random error reflected by . That is, the tentatively used model (the simple linear regression) is not sensible. As is true, , where is F distribution with degrees of freedom m-2 and n-m.

In general, we use the following procedure to fit the regression model when the data contain repeated observations.

  1. Fit the model, write down the usual analysis of variance table. Do not perform an F-test for regression (:).
  2. Perform the F-test for lack of fit. There are two possibilities.

(a) If significant lack of fit, stop theanalysis of the model fitting and seek ways to improve the model by examining residuals.

(b) If lack of fit test is not significant, carry out an F-test for regression, obtain confidence interval and so on. The residuals should still be plotted and examined for peculiarities.

Example:

X / Y
90 / 81,83
79 / 75
66 / 68,60,62
51 / 60,64
35 / 51,53

.

Thus,total sum of square:

.

residual sum of square

Pure error sum of square:

X

90: .

79:

66:

51:

35:

Then, pure error sum of square=2+0+34.67+8+2=46.67

Lack of fit sum of square=118.44-46.67=71.77

Source / df / SS / MS
SS( / 1 / 965.66 / 965.66
Lack of fit / 3 / 71.77 / 23.92
Pure error / 5 / 46.67 / 9.33

Total (corrected) 9 1084.1

Not significant!! That is, the simple linear regression is adequate. The standard F-test for regression can be carried out.

1