Bios 2063 2007-10-31 Model Choice, Model Averaging -1-

Hypothesis testing  Model choice

Some methods for model choice:

  • Likelihood ratio test
  • Mallow’s Cp
  • AIC Akaike information criterion
  • BIC Bayes information criterion

Likelihood ratio test criterion

(subject to some conditions), convergence in probability as .

In a stepwise forward procedure, choose A over B if

.

This is a cautious criterion—or is it?

AIC (Akaike information criterion)

Akaike’s result:

(Notice the plug-ins for f. There are no unknowns! Not asymptotic.)

For each model M in a nested series of models of increasing complexity, define

,

and choose the model

.

(“arg max” means the value of the function argument (here, M) that maximizes the function (here, AIC)).

FACT: AIC is linearly related to Mallows Cp , which (for a linear model) estimates the average residual sum of squares (RSS):

FACT: AICis linearly related tothe Kullback-Leibler discrepancy between the true model and the estimated model

.

Bayes Information Criterion (BIC)

Consider the “Bayes factor”

The idea is to approximate the integral by a local quadratic approximation at the mode (MLE) using the Laplace approximation. The result is:

as .

Comparisons of two nested models

For each model selection criterion,

is compared with a different number:

/ LRT
/ AIC
/ BIC

Some observations from W. Li

“AIC and likelihood-ratio test with a fixed significance level are not consistent (or “dimensionally consistent”, false positive rate doesnot approach zero in the limit). On the other hand, BIC is consistent (false positive rate approaches 0 in the limit).”

What is the LRT P value when the data exactly equal the BIC?

N
/ 7.389 / 10 / 100 / 1000
1 / 0.157 / 0.129 / 0.032 / 0.0086
5 / 0.075 / 0.042 / 3.3x10-4 / 1.9x10-6
10 / 0.029 / 0.011 / 1.4x10-6 / 6.7 x10-11
100 / ~10-7 / ~10-12 / ~0 / ~0

It’s interesting to contrast the first row with Table 4.2 from the 2007-10-31 hand-out on hypothesis testing.

-1-

Bios 2063 2007-10-31 Model Choice, Model Averaging -1-

The corresponding table for the AIC has all columns equal to the first column.

any N
any / 0.05
any N
1 / 0.157
5 / 0.075
10 / 0.029
100 / ~10-7

The corresponding table for the Likelihood Ratio Test of course is filled with “0.05”.

-1-

Bios 2063 2005-11-01 Model Choice, Model Averaging-1-

For more information, see a nice set of notes from Cavanaugh:

especially Lectures 2 and 6 (Schwartz Information Criterion is linearly related to the Bayes Information Criterion).

MODEL SELECTION AND MODEL AVERAGING

Given a set of models he exact posterior for a set of is given by

Sometimes the number of potential models is so large that this calculation cannot be done (imagine many complex causal trees).Then various sampling methods are used such as Monte-Carlo Markov chain methods.

You might think that the “best” single model is the one with the maximum posterior probability. Not true! See Barbieri and Berger if you’re interested (web site).

If you’re not limited to a single model, the best prediction is the posterior mixture

EXERCISE for November 14:

A) Calculate AIC, BIC, and BF for the Lump/ Split example

B) How can you use the BIC to do model-averaging?

-1-