Here Are the Results from the Analysis of the Dinosaur Reservoir. the Analysis Files And

Here are the results from the analysis of the Dinosaur Reservoir. The analysis files and results are in the

I had to use SAS for this analysis, as JMP does not have the analyses available in simple form in this current release.

(a) Design of study

If I understand the study design, each site was repeatedly measured six times over the last two years. Because each individual site was remeasured, this is similar to a Randomized Block Design (RCB) where sites serve as blocks and the six time measurements serve as the treatments.

A key difference is that in classical RCB designs, the treatments are randomized to experimental units – however, in this case you obviously can’t take the fall measurements before the spring measurements, nor take the 2003 measurements before the 2002 measurements. This would be a problem if you thought that the measurements taken close together in time (e.g. pass 1 and 2 of spring) were more highly correlated than measurements taken a year apart.

Given the small counts and that you only have 28 sites, this likely not a problem.

(b) All counts for a site being zero.

If all the counts for a sites are zero at the six measurement times, then this site is non-informative about changes in abundance about the species over time and is dropped from the analysis. For example, Site 12 has no information about changes in abundance of Mountain Whitefish over the six sampling times.

(c) Analysis

As the entries are smallish counts, a Poisson-like structure for the counts is a suitable structure (as opposed to a Normally distributed response that was used in the previous section after the log-transformation). Consequently, a generalized linear model approach is used for the analysis.

You mention that the sites are sampled with approximately the same length of time each time around etc, but this is difficult to control. Consequently, the data may exhibit a slight overdispersion, i.e. the data is slightly more variable than would be expected under a simple Poisson model. This is easily handled in the analysis.

The model used (see Proc Genmod in the SAS program) looks like:

proc genmod data=fish1;

title2 'Analysis of raw counts at each site - adjust for overdispersion';

by species;

class site time;

model count = site time / dist=poisson link=log type3 scale=deviance;

/* following contrast compares 2002 and 2003. Need to keep coefficient in right order to match time variable */

/* It is not necessary to divide the contrast coefficient by 3 because it is only a test.

It is necessary to divide the estimate coefficeints by 3 to get correct difference beween the

averages for the three years */

contrast 'diff between 2002 and 2003' time 1 -1 1 1 -1 -1 / e;

estimate 'diff between 2002 and 2003' time .33333333 -.33333333 .33333333 .33333333 -.33333333 -.33333333 / e exp;

The SAS procedure GENMOD does analysis of Generalized Linear Models.

The BY statement indicates that a separate analysis is to be done on each species.

The CLASS statement indicates that site and time are classification variables rather than continuous regression variables.

The key statement is the MODEL statement. Here it indicates that the COUNTS of fish vary because of SITE effects and because of TIME effects. For each combination of site and time, the data are Poisson-like, and the scale-deviance says to adjust for potential overdispersion.

The CONTRAST and ESTIMATE statement examine the average response for 2002 and 2003 to see if they are equal – the strange order of the coefficient has to match the alphabetical order of the time variables (f02 f03 s02_1 s02_2 s03_1 s03_2).

Here are the results (extracted from the dinosaur.lst file). These are read like ANOVA tables in many cases.

ALLFISH

Source Num DF Den DF F Value Pr >

site 27 135 6.32 <.0001

time 5 135 6.97 <.0001

There are significant effects of both SITE and TIME .

Standard

Label Estimate Error Confidence Limits

diff between 2002 and 2003 0.2335 0.0828 0.0713 0.3958

Exp(diff between 2002 and 2003)1.2630 0.1046 1.0739 1.4855

Contrast Num DF Den DF F Value Pr > F

diff between 2002 and 2003 1 135 8.03 0.0053

On the log-scale the average difference between 2002 and 2003 is .23 (SE .08) (top line) which converts to an estimated ratio between the 2002 and 2003 counts of 1.26 (SE .10) with a 95% ci ranging from 1.08 -> 1.49. Hence the abundance of all fish is estimated to be (on average) about 1.26 times bigger in 2002 than in 2003. As the 95% ci for this ratio does NOT include the value of 1, this is statistically significant. This is confirmed by the output from the Contrast which gives a p-value of .0053 to this comparison.

As a check, the ratio of the raw averages is about (10.75+9.64+13.86)/ ((6+10.14+11.71) = 1.23 times bigger.

RAINBOW trout

Source Num DF Den DF F Value Pr > F

site 27 135 8.02 <.0001

time 5 135 6.17 <.0001

Similarly, there is strong evidence of both a SITE and TIME effect upon the rainbow trout counts.

Standard

Label Estimate Error Confidence Limits

diff between 2002 and 2003 0.2299 0.1000 0.0338 0.4260

Exp(diff between 2002 and 2003 1.2585 0.1259 1.0344 1.5311

Contrast Num DF Den DF F Value Pr > F

diff between 2002 and 2003 1 135 5.34 0.0224

Similarly, the estimated ratio between the 2002 and 2003 rainbow counts is 1.25 (SE .12) with a 95% confidence interval for the ratio between 1.03 and 1.53. As this does not contain the value of 1, this is statistically significant with a p-value of .0224.

WHITEFISH

Source Num DF Den DF F Value Pr > F

site 24 120 3.91 <.0001

time 5 120 5.47 0.0001

Again strong evidence of an effect of SITE or TIME upon the counts of whitefish.

Standard

Label Estimate Error Confidence Limits

diff between 2002 and 2003 0.5642 0.1763 0.2187 0.9097

Exp(diff between 2002 and 2003 1.7580 0.3099 1.2444 2.4835

Contrast Num DF Den DF F Value Pr > F

diff between 2002 and 2003 1 120 10.85 0.0013

Estimated ratio of 2002 to 2003 counts is 1.76 (SE .31) with a 95% ci ranging from 1.24 to 2.48. As this does not include the value of 1, the ratio is statistically significant (p=.0013).

(d) Checking effect of woody debris

You notice that the sampling procedure has to now change because of the addition of the woody debris. This is a bit problematic as now you can’t tell the effect of the woody debris or just a year to year flucuation in numbers.

Any chance of only adding woody debris to 1/2 of the sites? Then when you do the sampling over all 28 sites, you can a control with nodebris that can be used to calibrate the different sampling method with the old sampling method.

About how big of a change do you expect to see because of the addition of woody debris, e.g. populations to double? If you have some idea, then some sort of power analysis could be done to see how many sites or years you now need to sample.