FW853--Topic 14. Choosing among models

  1. Need to first focus on what you expect out of your model
  2. Predictions
  3. Understanding of a “real” system’s dynamics - i.e., representing processes
  4. Understanding of model’s behavior, irrespective of any particular real system
  5. What if scenarios
  6. In absence of data, can still do modeling, and still have choices to make
  7. When you want to understand a model’s behavior, you often choose a generic model to see how it behaves. Also may have a new model, and you want to compare to other models
  8. Rely on first principles to choose among candidate models - focus on processes. THIS IS WHERE YOUR SCIENTIFIC KNOWLEDGE AND THE LITERATURE COME IN!
  9. When you have data, need a multi-pronged approach
  10. Can use data to help choose models. Hilborn and Mangel use the phrase “Confront your model with data”. By this, they mean you can let your data arbitrate among competing models.
  11. Note that when you use this approach, you have to build and evaluate a lot of different models. Furthermore, the data you have may not be quite right to distinguish among many of the possible models.

(1)Rely on plausibility of processes represented and assumptions

(2)Previous work

(3)What you are familiar with

(4)Time constraints and purpose of model

(5)What data are available

  1. Using data to help choose among models
  2. One approach we’ve used is in finding parameter(s) that minimizes sums of squares. Note that a model has both a structure (i.e., the functional form of the equations), as well as parameters that we need to find the best fitting values for.
  3. Example: linear model vs. exponential vs. power function vs. other
  4. Can look at sums of squared error (SSE) and compare how different models perform; this is particularly used when have a nested model (since adding more variables can only decrease the residual sums of squares). When we have nested models, we can test the “full model” (i.e., the model with the greater number of parameters) against the “reduced model” by an F test.
  1. By computing this F statistic, you can make a statistical decision among competing models (Note – compare this F with tabled F with dfr-dff , dff degrees of freedom). Note that for non-nested models, it becomes more of a problem since two models may have the same degrees of freedom, and have similar sums of squares.
  1. Another approach - likelihood based methods. Recent book by Burnham and Anderson (Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Second Edition 2002) provides a in-depth overview. This book can be tough reading because they use topics and concepts they haven’t defined or introduced earlier, but much of it can be read and understood without relying on the mathematical theory.
  2. What is a likelihood? - can be viewed as the probability that the actual observations would have occurred if the parameters were true. We typically find the maximum likelihood - which is the parameter values or model that maximized the probability that the observations would have occurred if the parameters were true.
  3. Example: Go out on one day and capture 10 fish. Mark these fish and release them. Go out the next day and capture 11 fish, 2 of which are marked. What is the maximum likelihood estimate of the population size?
  4. Key concept - we can describe a probabilistic model for the number of marked and unmarked fish based on the binomial distribution. In this case, the formula is:
  1. Note a subtle, but important distinction

(1)In usual probabilities, we compute P(Yi | p) where the Yi are various data outcomes, and p is the parameter. One way of wording this is that this is the probability of outcomes given a parameter value. Thus, the parameter is known a priori, and the actual data are unknown. We can, however, compute the probability of any given outcome, and if we have an exhaustive list of all possible outcomes, these add up to 1. In the tagging example, if we knew the population size, and we knew how many fish we tagged the first day, and we knew how many fish we captured the second day, we we can compute the probability of having 0, 1, 2, 3, 4, ...M fish caught the second day with tags.

(2)In Likelihoods, we compute L(hypothesis | Y) where Y is the actual outcome observed (data), and the hypothesis is a model or model/parameter combination. Note that since innumerable hypothesis exist, the sum of the likelihoods does not add up to one. A critical concept is that the likelihood as stated above is proportional to the actual probability. In the tagging example, We have multiple hypotheses (i.e., the population size), many of which are quite likely.

  1. Using likelihoods
  2. Like probabilities, when have multiple likelihoods you can take their product (or if use log(likelihood), they are additive).
  3. Because likelihoods are often very small numbers, we often work with their logarithms. Further, we often take the negative log-likelihood to make it a minimization problem (and for other reasons listed below)
  4. When have nested models, we can compare different models with:

With non-nested models and large sample sizes relative to the number of parameters estimated, often use Akaike’s Information Criterion (AIC) which is computed as:

–2ln() + 2(number of parameters).

When the sample size is relatively small, an adjusted AICc is generally used:

The model with the best AIC is the “best” model, meaning that it provides the best tradeoff between fitting the data and parsimony. Note problem (as we had with sums of squares) that you can have two models with the same number of parameters and with similar AIC values. Note that it would be difficult to say statistically how confident you are that the choice you made is really better.

A different philosophical approach to the problem is to not choose a single model, but to provide a table with “weight of evidence” for each model. The first step is to compute the difference (delta) between the AIC for each model and the best model. The Akaike weights can then be computed as:

The weight of evidence ratio is then the wi for the best model divided by the wi for a given model. Larger numbers indicate less support for a given model relative to the best model considered.

  1. As an aside, likelihoods can also be used to compute confidence intervals on the parameters.
  1. An often forgotten criterion for evaluating models beyond the sums of squares and likelihood is to examine the residuals of the model for patterns. Although a better model will have a higher likelihood or smaller residual sums of squares, it does not mean it is an apt model (meaning the correct model). Draw picture of linear model fitted to power function.

Example – Reduced sums of squares
Otis et al. hierarchy of models for population estimation

Model M0 - very strict assumptions

-all fish have equal catchability over all sampling intervals

-marking does not affect behavior of fish

-two parameters: probability of capture (p) and population size (N)

Model Mt - less stringent assumptions

-all fish have equal catchability within a sampling interval, but catchability can vary between sampling intervals

-marking does not affect behavior of fish

-parameters: probability of capture in each sampling interval (p1, p2, p3 …) and population size (N)

-This is a more “full” model than M0

Model Mb - also less stringent assumptions

-Unmarked fish have equal catchability over all sampling intervals

-marking is allowed to alter affect behavior of fish, thereby changing catchability

-three parameters: probability of capture for unmarked fish (p), probability of re-capture after marking, and population size (N)

Example log-Likelihood Functions

Normal:

Lognormal:

Multinomial: