Analysis of fish monitoring data from the Mamquam River
Carl James Schwarz (P.Stat. (Can, U.S.))
Department of Statistics and Actuarial Science
Simon Fraser University
Burnaby, BC V5A 1S6
2011-02-14
1. Introduction
This report discusses the analysis of fisheries data collected on the Mamquam River as part of an ongoing monitoring project for an independent power project. A details discussion of the sampling protocol is available in Hanson (xxxx).
Briefly, three locations were sampled on the Mamquam River. Location DS is located approximately 200 m downstream of the powerhouse. Sampling occurred in 1999 and then in 2005-2010. Location DIV is in the diversion reach (partly dewatered during operations) and was monitored only in 2005-2010. Location US is upstream of the influence of the project footprint and was monitored in 1999 and 2005-2010.
The sampling in 1999 will serve as baseline information while the information collected in 2005-2010 was collected after the project became operational. At each location, several sites were selected based on availability of suitable habitat and safely concerns. At each site, fish were collected using electrofishing (with different crews over the project history). The sampling program was generally a two-pass closed-population removal study, but in some years, safety concerns or access prohibited setting nets and closure may not be satisfied. For the majority of fish that were sampled during the electrofishing, biological variables such as weight, length, age, and species were measured. Fulton’s condition factor was computed based on the weight and length.
The US location will serve as the control site for testing hypotheses about environmental impact. The DS location is partially impacted by the project and it and the DIV location and treated as the impact sites.
Various plots were produced to identify anomalous data and these have been corrected in the analyses that follow.
Summary statistics on the numbers of fish caught, weight, length, and condition factor are presented in Table 1 for the two species observed in the study (Rainbow Trout (RB) and Dolly Varden (DV)). The number of DV fish captured was typically very small in most years except for 2009-2010 in the DIV location. The number of RB fish captured was reasonably large in most years.
2. Analysis of the weight, fork length, and condition factors.
There are two key assumptions made for the analyses in this section. The first assumption is that electrofishing either captures all of the fish in the sample site or a random (representative) sample of the fish at the site within each species and age class. The second assumption is that fish from the sampled sites within a location are a random (representative) sample of the fish at that location again within each species and age class. These assumptions could be violated if, for example, electofishing tended to over sample lighter, smaller fish or if the sites sampled within a location tended to have fish that were larger/smaller on average than all fish at that location.
There are no easy ways to assess these assumptions. Plots of the mean weight, mean fork length, and mean condition factor (not shown) by electrofishing pass did not show evidence of a differential in the mean among the passes. This provides some evidence that electrofishing is producing a random sample of fish present at the sample site. There is no way to compare the fish sampled with other fish in the same location and so this assumption must be accepted on faith.
Summary statistics by age class are presented in Table 2. Sample sizes for nearly all age classes in all years are small for DV but arereasonablylarge for RB in age class 0 (YoY) and age class 1. Sample sizes for age classes 2 and 3 for RB are also small in all years.
A preliminary plot of the variance-to-mean ratio for all three variable indicated that a more suitable scale for the analysis of weight in on the natural (base e) logarithmic scale. Fork length and condition factor were analyzed on the untransfomed scale.
Summary plots of the changes in the sample mean for the three variables by age class are presented in Figures 1 to 3. The plots of changes in the mean log(weight) and mean fork length show a clear, consistent, separation by age class (as expected) but with large year effects that are consistent across locations. For example, the mean log(weight) and mean fork length for RB appear to increase in 2007 in all locations, and then decline through to 2010. The pattern of change for the mean condition factor for RB is less consistent and there is no clear separation in the pattern by age classes. The patterns for DV are less apparent because of the small sample sizes.
There are several classes of statistical tests that can be done to assess evidence of change in the mean of the biological variable over time and to assess if changes in the means may reflect an impact of the project.
2.1 Test for changes over time.
Two sets of hypotheses were examined. The first hypothesis examined is of no change in the mean biological variable over time:
where represents the population mean of the biological variable in year i.
The second hypothesis examined is of no linear change over time in the mean biological variable over time:
where represents the population slope in the regression model .
The first hypothesis was examined by using the Analysis of Variance (ANOVA) methodology on the biological variables for each combination of species, location, and age. The second hypothesis was examined using simple regression methodology on the biological variables for each combination of species, location, and age.
Two sets of analyses were done – including and excluding 1999 – because the DIV location was not measured in 1999 and comparison of the results among locations should be done on a comparable set of years. Results are summarized in Tables 3a and 3b. In some years, only a single fish was measured. This does not present any problems for ANOVA as the standard deviation for that year would be based on a standard deviation “averaged” over the standard deviations from years with multiple data points.
The number of fish in age classes 2 and 3 was often very small and so in many cases, no statistical test could be performed. Similarly, the number of DV fish was often small and so statistical tests may have limited power to detect differences.
The results indicate strong evidence of changes (p-values very small) in the means over time for fish from age classes 0. However, this analysis is simplistic because larger sample sizes enable the analyst to detect smaller changes, and because changes in the mean of a biological variable over time may simply reflect large year specific influences (e.g. a cooler year may depress growth for all fish relative to the long-term average, but reflect uncontrollable factors). Similarly linear changes in the mean over time may reflect longer-term trend in uncontrollable factors.
Unfortunately, the p-value from the ANOVA provides no information on which years have means that are different from other years. It would be possible to examine the year-to-year differences a using multiple-comparison procedure (e.g. Tukey's multiple comparison procedure) but, as noted above, these differences may be the results of simple natural variation in uncontrollable factors.
2.2 Tests for parallel changes in time;
To try and account for year-specific factors that may influence growth on a larger scale, a test for parallelism in the changes in the mean were conducted. Here the hypothesis is that the change in the mean between two years is the same regardless of location:
where represents the mean of the biological variable in year i in location A. The hypothesis is examined overall all pairs of years and locations using a two-factor ANOVA with location and years as main effects, and the test of parallelism corresponds to the test for the presence of interaction between the location and year factors.
Again because location DIV was not measured in 1999, two sets of tests were conducted. First, the hypothesis was examined for locations US and DS over all years; second, the hypothesis was examined for all three locations but only post 1999. Results are summarized in Table 4 for each combination of species and age.
There was weak evidence of non-parallelism in the mean condition factor for the DV age class 0 fish when the DS/US locations were compared over all years. There was strong evidence of non-parallel changes in the mean response for RB in nearly all comparisons. Figures 1-3 should be consulted to identify where the non-parallelism occurred.
2.3 Estimation of change in mean between pre- and post-project implementation.
A simple comparison of the mean of the biological variables pre- and post-implementation can be done for locations DS and US; location DIV was not measured in 1999 and so such a comparison cannot be done.
This comparison must “average” over the several years of data collected post-implementation in way that weights each year equally rather then weighting by the sample size (which would be the case if a simple mean of measurements overall years post-implementation were done). For example, suppose that the fork-length for a single fish measured in year 1 was 22 mm, while the fork-length of 2 fish measured in year 2 were 25 and 27 mm. The simple mean (22+25+27)/3=24.7 mm would be more heavily weighted toward the mean in the second year. In order to give each year’s data equal weight, the average fork length over both years is computed as an average of averages:
mm
This can be extended to multiple years in a similar fashion.
The hypothesis of interest is:
where represents the mean of the biological variable in year i.
This comparison is done using the mixed-model ANOVA represented in standard notation as:
Model Y = Period Year(Period)-R
where the Y represents the biological variable of interest; Period represents the pre- (1999) versus post- (2005 onwards) periods; and year(period) represents the year specific effects within each period and are treated as random influences on the mean response.
Estimates of the change in the mean between the two periods for each species and biological variable at the DS and US locations for the two youngest age classes are presented in Table 5. No evidence of a change in the mean between the two periods was found for RB in any of the biological variables in either age class, but there was some evidence of positive change in the mean fork-length and log(weight) for DV age class 0. Figures 1-3 should be consulted for more details. It should be noted that because of the high variability both within year and across years, estimates of changes were not very precise (i.e. standard error were large) which precluded identifying small changes.
Notice that because weight was measured on the log-scale, the estimate of the pre- vs post-implementation comparison is the log(ratio) of the change in mean weights. For example, the mean weight of RB at DS at age 0 is estimated to be are exp(2.11)=8.24 times heavier in the post-implementation period than the pre-implementation period, but the result is not statistically significant from a 1x increase. If the raw means are examined (Table 2b), fish in the baseline year had a mean weight which was small (around 0.2 g), and the average in the post-period is about 1-2 grams which is about 8x larger.
While these comparisons often fail to show evidence of a change between the pre- and post-implementation periods, they do not examine if the change in the mean is equal among the control and impact sites.
2.4 BACI contrast.
The standard method to assess an environmental impact is through a Before-After-Control-Impact (BACI) design. The analysis of these designs looks for non-parallelism in response (similar to what was done earlier) but focuses on the change in means between the two periods. The hypothesis of interest is:
where represents the mean in year i and location A. If there were no impact of the project, the difference in the means should be equal, i.e. the difference in the differences in the mean should be zero.
Two analyses were conducted. First, because only US and DS were measured in 1999, an analysis restricted to these two sites only was done. This is conducted using a two factor ANOVA model (in standard notation):
Model Y = location period location*period year(period)-r
where Y represents the biological variable of interest; Period represents the pre- (1999) versus post- (2005 onwards) periods; location represents the two locations; and year(period) represents the year specific effects within each period and are treated as random influences on the mean response. A separate analysis was done for each combination of biological variable and age class (0 and 1). Age classes 2 and 3 were not analyzed because of insufficient data. Results are presented in Table 6a.
There was weak or no evidence of an effect for DV in both age classes. Given the small sample sizes for DV, estimates of the BACI effect have poor precision. There was evidence of a BACI effect in RB for condition factor and log(weight) for age class 0 and for condition factor for age class 1. In all cases, the BACI effect estimate was positive indicating a greater change in the DS location than the US location.
Again note that weight was measured on the logarithmic scale and so estimates of the BACI effect are a logarithm of the ratio of two ratios. For example, consider the log(weight) values for RB age class 0. The log(weight) INCREASES for the US site from -1.22 to 0.07 (corresponding to an actual mean weight of exp(-1.22)=.295 increasing to exp(0.07)=1.07 g. Similarly the log weight for the DIV+DS sites is estimated to increase from -2.22 to -.08 corresponding to an increase from exp(-2.22)=.108 to exp(-.08)=92 g. So in BOTH types of sites, the mean weight increased. However, the mean weight increase in the control site is LESS than the mean weight gain in the impact sites.
The difference in log weight for the control site (.07-(-1.22))=1.29 corresponds to an increase by exp(1.29)=3.63 fold. Similarly, the difference in log weight for the impact sites of (0.08(2.22))=2.3 corresponds to a exp(2.14)=8.50 fold increase. The ratio of the fold increases is 3.63/8.50=.43 and log(.43)=-.83 which is what is reported as the estimate. The fold increase in the control site is less than the fold increase in the treatment site.
A second BACI analysis was also conducted but now including the DIV location. This is conducted using a two-factor ANOVA model (in standard notation):
Model Y = location_type period location_type*period
year(period)-r location(location_type)
where Y represents the biological variable of interest; Period represents the pre- (1999) versus post- (2005 onwards) periods; location_type represents the two types of locations (control or impact); year(period) represents the year specific effects within each period and are treated as random influences on the mean response; and location(location_type) represents the two locations (DS and DIV) in the impacted areas.
Because the DIV location was not measured in 1999, it provides no direct information on the change in the mean between the pre- and post-period. But because it was measured in the post-implementation period, it does provide information about the year effects. This, in-turn, can influence the post-implementation mean because of different weighting applied to account for the year effects. Results are summarized in Table 6b. There was no evidence of a BACI effect except for the condition factor of RB age 1 fish and the log(weight) or RB age 0 fish.
3. Analysis of density
The data on fish abundance and density was collected using a multi-pass removal sampling protocol. The abundance was NOT estimated using methods for removal sampling because the number of fish captured was small, was often zero on the second pass, and for some sites, the population was not closed because it was impossible to set nets for safety or access reasons. Consequently, the raw counts were used and it is believed that these are a reasonable proxy for the actual abundance.
Values of 0 fish were imputed for fish in age classes where no data was present for an age class or a species on a pass. The raw counts were converted to a density based on the area of the site being sampled (provided with the raw data). A summary of the density values is presented in Table 7 and plotted in Figures 4-6.
There appears to be a consistent decline in density in the 2006-2008 period, the cause of which is unknown.
3.1 Test for changes over time.
Because only one value for density is computed for a year, it is not possible to construct a simple test of hypothesis that the density is equal in all years without making very strong assumptions about the distribution of counts (e.g. that the counts have a Poisson distribution) that cannot be examined. Consequently, only the following tests of hypotheses about changes over time were constructed:
where is the density in year i. and