FW853--Topic 8. Bootstrapping

  1. The type of stochastic simulations we have been talking about so far are also called parametric bootstraps. The word parametric here refers to the idea that we are specifying a distribution and its parameters for the simulation. Typically, the term bootstrapping refers to a non-parametric bootstrap where the distribution is not specified directly.
  2. The fundamental idea behind the non-parametric bootstrap (I’ll call it the bootstrap from here on out) is that the underlying distribution can be well represented by repeated sampling from our data, if we sample with replacement. Fundamentally, each observation in our data is treated as being equally likely, and thus if we sample from this small population many times, we can develop a distribution that mimics the “true” underlying distribution that we collected the sample from.
  3. When would this be useful
  4. Widely used in statistics to develop non-parametric estimates of variance, confidence intervals, and measures of bias
  5. In simulation modeling, can be useful when have a distribution that is difficult to parameterize, but have quite a bit of data on. Example- delta distribution (see figure).
  6. Bootstrap samples are not difficult conceptually, but can be a pain to do in some programming environments (e.g., in SAS most people do this with a macro; I’m not sure how you would do it in Excel). On the other hand, there are some programs that are specifically designed to take bootstrap samples.
  7. Go through SAS example as in-class exercise when time permits
  8. Warnings!!
  9. Small sample sizes
  10. This concept doesn’t apply to some statistics - minimum or maximum statistics in particular
  11. Need to preserve correlational structure - often people assume independent sampling when this is not the case
  12. Don’t take all claims about bootstrapping at face value
  13. When have a model - need to take bootstrap sample from residuals rather than from raw observations

In-class exercise

Spring 2003

Working in pairs or groups of three, use the SAS programs that I showed in class to explore the bootstrap. Rather than do “elegant” programming, run the SAS program 20 times, and collect the results in a spreadsheet. Answer the following questions:

1.How do the statistics computed for the sample (i.e., the original 20 fish weighed) compare to the 20 non-parametric bootstrap replicates, and with the 20 parametric bootstrap (simulation) replicates?

2.What is the distribution (i.e., create a histogram) of the sample statistics on the bootstrap samples and the parametric simulation results.

3.From the histogram of the bootstrap means, you can estimate the 90% confidence interval by finding the sample that is the upper 5 percentile and the lower 5 percentile. In this case with 20 samples, use the minimum and maximum of the bootstrap means to approximate the upper and lower 5 percentiles. Note that because we are doing a small number of bootstrap replicates, this won’t be very precise.