- Model selection:
There are two opposed criteria of selecting a model:
Including as many covariates as possible so that the fitted values are reliable.
Including as few covariates as possible so that the costs of obtaining information and monitering is not a lot.
Note: there is no unique statistical procedure for selecting the best regression model.
Note: common sense, basic knowledge of the data being analyzed, and considerations related to invariance principle (shift and scale invariance) can not ever be set side.
In this section, two criteria, Mallow’s and Akaike’s AIC, will be introduced. Furthermore, a automatic procedures based on the above two criteria will be also introduced. We use the “Hald” regression data as a motivating example:
7 / 26 / 6 / 60 / 78.51 / 29 / 15 / 52 / 74.3
10 / 68 / 8 / 12 / 109.4
total 13 observations.
Our objective is to decide a few sensible models from the following 5 sets of models for the above data:
Set A: possible model
Set B: possible models.
Set C: possible models.
Set D:
possible models.
Set E: possible model.
Total possible models.
(a) Mallow’s :
Suppose there are r-1 possible covariates, .
Mallow’s is defined as
,
where n is the sample size, p is the number of covariates including the intercept, RSS(p) is the residual sum of squares from a model containing p parameters, and is the mean residual sum of square from the model containing all possible covariates
(i.e. ). Intuitively, if is the true model, then , the mean residual sum of squares from model p, should be a sensible estimate accurately. That is, . Thus,
. Also, the mean residual sum of squares for the overfitted model .
Thus, will falls close to the line of
Note that The principle of selecting a best regression equation is to plot versus p. Then, Choose some models with fewer covariates close to the line
For the motivating example, we calculate for all 16 possible models. We then have the following table:
Set A / 443.2Set B / 202.5 () ,142.5 () ,315.2 () ,138.7 ()
Set C / 2.7 (,) ,198.1 (,) ,5.5 (,), 62.4 (,),
138.2 (,), 22.4 (,)
Set D / 3 (,,), 3 (,,), 3.5 (,,), 7.3 (,,)
Set E / 5 (,,,)
The point value for the model is close to the line and the model also has fewer parameters. Therefore, we recommend this model as a sensible choice.
Example:
Set A: .
Set B:
Set C:
Set D:
We will show how to use in Splus to select the sensible models.
>ozonelm3<-lm(ozone~radiation+temperature+wind,data=air)
#
>anova(ozonelm3)
>anoozonelm3<-anova(ozonelm3)
>anoozonelm3[4,3] #
>s2<-anoozonelm3[4,3]
>ozonelma<-lm(ozone~1,data=air) #
>ozonelma
>anova(ozonelma)
>anoozonelma<-anova(ozonelma)
>anoozonelma[2] # RSS(1)
>cpa<-(anoozonelma[2]/s2)-(111-2*1) #
>ozonelmb1<-lm(ozone~radiation,data=air) #
>anova(ozonelmb1)
>anoozonelmb1<-anova(ozonelmb1)
>cpb1<-(anoozonelmb1[2,2]/s2)-(111-2*2) #
>cpb1
>ozonelmb2<-lm(ozone~temperature,data=air) #
>anoozonelmb2<-anova(ozonelmb2)
>cpb2<-(anoozonelmb2[2,2]/s2)-(111-2*2) #
>cpb2
>ozonelmb3<-lm(ozone~wind,data=air) v#
>anoozonelmb3<-anova(ozonelmb3)
>cpb3<-(anoozonelmb3[2,2]/s2)-(111-2*2) #
>cpb3
>ozonelmc1<-lm(ozone~radiation+temperature,data=air)
#
>anova(ozonelmc1)
>anoozonelmc1<-anova(ozonelmc1)
>cpc1<-(anoozonelmc1[3,2]/s2)-(111-2*3) #
>cpc1
>ozonelmc2<-lm(ozone~radiation+wind,data=air)
#
>anoozonelmc2<-anova(ozonelmc2)
>cpc2<-(anoozonelmc2[3,2]/s2)-(111-2*3) #
>cpc2
>ozonelmc3<-lm(ozone~temperature+wind,data=air)
#
>anoozonelmc3<-anova(ozonelmc3)
>cpc3<-(anoozonelmc3[3,2]/s2)-(111-2*3) #
>cpc3
We found that for the above models are much larger than the value of p. Therefore, is selected as a sensible model.
(b) AIC (Akaike Information Criterion):
As is known,
,
where c is a constant and .
As is unknown,
Note: in Splus, the automatic variable selection method uses the AIC(p) formula for the case with known. However, the method replaces in the formula with some estimates. Therefore, it is not exact AIC(p). The formula used in Splus is , where is the estimate of and is not necessary to be equal to .
Example:
>step(ozonelma,~radiation+temperature+wind,scale=s2,data=air)
#
# shown in the printout of Splus is AIC(p).
>step(ozonelma,~radiation+temperature+wind,scale=s2,trace=F,data=air)
>s2*(cpa+111)
>step(ozonelma,~radiation+temperature+wind,data=air)
# where scale= is the mean residual sum of squares for
# the model .
Note: the above automatic selection method starts from the model as the original model. Then, add one variable sequentially. The augmented models with larger AIC(p) than the one for the original model is not considered. Among those models with smaller AIC(p) than the original model, the one with smallest AIC(p) is selected as the new original model. Then, repeat the above process until no more variable can be added to achieve the reduction of AIC(p). The automatic variable selection method is forward selection. We can also use the backward selection in Splus.
> step(ozonelm3,~radiation+temperature+wind,data=air)
1