Hi Brian.

I had a look at the Parnsip Data that you sent me and here are some suggestions and comments.

First, do you have access to JMP-IN or SAS or some statistical package? I used JMP-IN (V5.1) to do the analyses and have attached a copy of the data analysis table that includes scripts that repeat the analyses that I did.

(a) Adjustment of counts to a per 1000 m basis.

I notice that you adjusted the counts to per 1000 m. Do you have the actual length of the strip that was measured? In many cases, the actual counts follow a Poisson-like distribution that depends upon the length sampled. If you have the actual length and the actual counts, you can often do a slightly different analysis that automatically adjusts for the different lengths of the segments. You will get similar results as below, so it is not crucial that these are present.

(b) Need for a transformation before analysis

I first looked at dot plots of the raw data and computed the means and standard deviations of each year – similar to what you did in your Excel spread sheet. [The Dot-plot scripts]

Oneway Analysis of Y.of.year By year

Means and Std Deviations

Level / Number / Mean / Std Dev / Std Err Mean / Lower 95% / Upper 95% /
2000 / 66 / 47.0316 / 78.5112 / 9.6641 / 27.731 / 66.332
2001 / 34 / 32.6986 / 48.1657 / 8.2603 / 15.893 / 49.504
2003 / 30 / 17.1993 / 24.1840 / 4.4154 / 8.169 / 26.230

Oneway Analysis of Age 1+ By year

Means and Std Deviations

Level / Number / Mean / Std Dev / Std Err Mean / Lower 95% / Upper 95% /
2000 / 66 / 31.3243 / 49.4287 / 6.084 / 19.173 / 43.475
2001 / 34 / 50.5604 / 73.6051 / 12.623 / 24.878 / 76.242
2003 / 30 / 28.7620 / 30.5640 / 5.580 / 17.349 / 40.175

The data have an obvious long right tail and comparing the means and standard deviations, it appears that the standard deviation increases with the mean.

(c) Transformation selected

In these cases, a log (either natural or common) transform is often done. Because of the presence of zeros, normally a small positive constant is added to each value. The usual rule of thumb is to add the smallest possible non-zero value. This will depend upon the size of the segments measured, e.g. if the segments are about 250 m in length, then the smallest non-zero value/1000 m is 4 and so you would add 4 before taking logs. The smallest value in all your tables is a 5. I used the same value for both variables.

(d) Analysis of log(Young of Year+1)

Oneway Analysis of log(YoY+5) By year

Oneway Anova

Analysis of Variance

Source / DF / Sum of Squares / Mean Square / F Ratio / Prob > F /
year / 2 / 8.10242 / 4.05121 / 3.1697 / 0.0454
Error / 127 / 162.32012 / 1.27811
C. Total / 129 / 170.42254

Means and Std Deviations

Level / Number / Mean / Std Dev / Std Err Mean / Lower 95% / Upper 95% /
2000 / 66 / 3.27053 / 1.18555 / 0.14593 / 2.9791 / 3.5620
2001 / 34 / 2.96654 / 1.18483 / 0.20320 / 2.5531 / 3.3799
2003 / 30 / 2.65605 / 0.92165 / 0.16827 / 2.3119 / 3.0002

The standard deviations on the log scale are much more comparable. There is some evidence that the means are different with an apparent decline from 2000 to 2003.

A Tukey multiple comparison procedure (below) shows that there is evidence that mean in 2000 is higher than in 2003.

Tukey joined lines plot.

Level / Mean /
2000 / A / 3.2705310
2001 / A / B / 2.9665389
2003 / B / 2.6560530

Levels not connected by same letter are significantly different

You can also estimate the change across years from the table below:

Level / - Level / Difference / Lower CL / Upper CL / Difference /
2000 / 2003 / 0.6144780 / 0.024117 / 1.204839
2001 / 2003 / 0.3104859 / -0.361105 / 0.982077
2000 / 2001 / 0.3039921 / -0.261991 / 0.869975

For example, the estimated difference (on the log scale) between the mean in 2000 and 2003 is about .61. Because this is on the log-scale, it implies that the mean in 2000 is about exp(.61)= 1.84 times larger than the mean in 2003. The actual ratio of means was larger, but there are few very large values in 2000 that tend to inflate the mean.

The analysis is somewhat sensitive to the choice of the addition before taking logs. For example, if I used log(YoY+1), then the means are NOT found to be statistically different. This is a bit worrisome, so if this is a crucial analysis, the analysis using the actual counts and lengths of segments may be preferred.

A 50% change in mean density would correspond to a difference of 0.69 on the log-scale. Notice that you have already detected this above, so your sample sizes are likely about right.

A power analysis requires estimate of the standard deviation (on the log scale) which is about 1.13 ( a weighted average of the individual group standard deviations) and the estimated difference to detect (about .69 on the log scale). The power curve below shows that for an 80% power you need a TOTAL sample size of about 80 or about 40/year. The actual power computation is slightly more complicated if you have unequal sample sizes, but this should be close enough. Consequently, I would recommend that you continue with your sampling of about 30 sites/year.

Sample Size

Testing if two means are different from each other.

Alpha

0.050Error Std Dev

1.13Extra Params

0

Supply two values to determine the third.

Enter one value to see a plot of the other two.

Difference to detect / 0.69
Sample Size / .
Power / .

Sample Size is the total sample size; per group would be n/2

Two Means

Error Std Dev / Difference in Means / Alpha /
1.13 / 0.69 / 0.050

(e) Analysis of Age 1+ data

I repeated the above analysis on the Age1+ data. The smallest value is a 6 so I again used the value of 5 before the transform, i.e. analyzed log(A1 +5).

Oneway Analysis of log(A1+5) By year

Oneway Anova

Analysis of Variance

Source / DF / Sum of Squares / Mean Square / F Ratio / Prob > F /
year / 2 / 4.08031 / 2.04016 / 1.5653 / 0.2130
Error / 127 / 165.52364 / 1.30334
C. Total / 129 / 169.60396

Means and Std Deviations

Level / Number / Mean / Std Dev / Std Err Mean / Lower 95% / Upper 95% /
2000 / 66 / 2.89353 / 1.18179 / 0.14547 / 2.6030 / 3.1841
2001 / 34 / 3.30131 / 1.24934 / 0.21426 / 2.8654 / 3.7372
2003 / 30 / 3.15514 / 0.89509 / 0.16342 / 2.8209 / 3.4894

After the transform, the standard deviations are now abou equal. You failed to detect a difference in the mean density among years in this case.

In order to figure out what size of difference you would be able to detect, the standard deviation (about 1.14 on the log scale) and the sample sizes of around 80 for two sample periods to give the following graph. This is different from the previous graph in that the sample size is held fixed and the difference vs power curve is now plotted.

Sample Size

Two Means

Error Std Dev / Sample Size / Alpha /
1.14 / 80 / 0.050

Now a difference of about .69 on the log scale has about an 80% power of being detected (just as you saw before). You saw a difference of about .4 (3.3-2.9) on the log-scale – this has only a power of about 30% of being detected.

Both power analyses only used “two” samples, but will give essentially the same results if you have 3 samples etc.