1

Analysis of Black Hawk County Wetlands: West Lake and Railroad Lake

Nicole Williams and Mark D. Ecker

Department of Mathematics

University of Northern Iowa

Cedar Falls, Iowa 50614-0506 USA

August 24, 2003

ABSTRACT

Students and faculty at the University of Northern Iowa have been analyzing water quality from 2002 in a wetland area dominated by West Lake and Railroad Lake. A statistical goal for West Lake and Railroad Lake was to check the two lakes for spatial correlation, i.e., pairs of observations closer in space tend to be more similar then pairs further apart. We explore spatial correlation for the variables dissolved oxygen and phosphorus. After inspecting empirical variograms we find little evidence of spatial correlation. Another goal concerning the Black Hawk County wetlands is to run an analysis of covariance (ANCOVA) to explore what relationship (regression) exists between the dependent variable phosphorus and the independent variables dissolved oxygen, water depth, and sediment depth. The goal of this analysis is to predict the phosphorus level in either West Lake or Railroad Lake when different levels of independent variables are used. We concluded that the slopes for the independent variables are all the same, but the intercepts for West Lake and Railroad Lake are different, i.e., Railroad Lake has a higher phosphorus level.

  1. INTRODUCTION

Students and faculty at the University of Northern Iowa have been analyzing water quality in a wetland area dominated by West Lake and Railroad Lake just northwest of campus (see Figure 1). West Lake has a surface area that is approximately 85,029 square meters (Oelmann and Ecker 2002). Both lakes drain into the Beaver Creek and then proceed into the Cedar River (see Figure 2). When the Cedar River floods it pours back into Railroad Lake causing the lake to have a higher phosphorus level then West Lake.


Figure 1. West Lake and Railroad Lake.


Figure 2. Beaver Creek together with the wetland lakes.

West Lake had 20 sites that were sampled for water quality during the 2002 summer (see Figure 3) and Railroad Lake had 16 sampled sites. The water quality variables that were collected from West Lake were phosphorus level, water depth, sediment depth, dissolved oxygen, and temperature. The variables collected from Railroad Lake were phosphorus level, water depth, sediment depth, and dissolved oxygen. Table 1 includes the summary statistics for phosphorus and dissolved oxygen
for each lake.

Figure 3. West Lake 2002 sampling sites.

Phosphorus / Dissolved Oxygen
West Lake / Railroad Lake / West Lake / Railroad Lake
Min. / 108.0 / 335.0 / Min. / 0.2 / 8.6
Mean / 549.8 / 690.2 / Mean / 3.8 / 10.4
Median / 597.0 / 683.5 / Median / 3.8 / 10.2
Max. / 960.0 / 1109 / Max. / 5.9 / 12.6
St. Dev. / 217.9 / 236.3 / St. Dev. / 1.4 / 1.0

Table 1. Summary statistics of sediment phosphorus and dissolved oxygen for West Lake and Railroad Lake.

  1. SPATIAL CORRELATION FOR SEDIMENT PHOSPHORUS

The statistical goal with spatial correlation is to explain the potential clustering of the data, if present, through variograms (see Carlson and Ecker, 2002). Spatial clustering involves closely located data sites that have relatively similar output for the given variable. If there are spatially continuous responses at the sampled sites then predictions, based upon the spatial correlation, can be made at unsampled sites within the lake based on the variogram.

First the variogram cloud is calculated (see Figure 4) to explore potential clustering of phosphorus levels with the two location variables easting and northing. Distance between all pairs of observations is on the x-axis while the variability of phosphorus for the given pair is on the y-axis. It is typically difficult to see evidence of spatial correlation in a variogram cloud. When Figure 4 is examined closer, we can also see several groups, or sections, forming. These sections result from the five different transect lines that each have four samples taken relatively close together.


Figure 4. Variogram Cloud for West Lake.

The empirical variogram creates distance intervals or bins and aggregates all pairs of observations within each bin to produce a mean variability estimate for each distance interval. This procedure is analogous to creating a histogram from raw data. We construct empirical variograms using four different distance bin widths (lags) where the number of bins or lags (nlag) are arbitrarily chosen as 10, 20, 40, and 50. Looking at the results for the different lags, a decision can be made if sediment phosphorus is spatially correlated or not. If the variogram is essentially horizontal then spatial correlation is not present; if the points are not horizontal (tending to rise up and level off) then spatial correlation is present. Figure 5 shows nlag=10 and Figure 6 shows nlag=50. Since the data points in either figure are essentially horizontal we conclude that sediment phosphorus is not spatially correlated. The same lack of spatial correlation was also evident for dissolved oxygen. We shall use a regression model to
perform predictions.

Figure 5. Variogram for West Lake for phosphorus with nlag=10.


Figure 6. Variogram for West Lake for phosphorus with nlag=50.

  1. ANALYSIS OF COVARIANCE

The goal of an analysis of covariance (ANCOVA) is to construct a regression equation that determines the phosphorus level in either West Lake or Railroad Lake when different levels of independent variables are used. These independent variables include: water depth, sediment depth, and dissolved oxygen.

There are three different potential regression models for an ANCOVA: separate, parallel, and the common regression model. A separate model indicates the regression equations are different for the two lakes. Then West Lake and Railroad Lake are considered to have different data. The lakes have a parallel regression line when they have different intercepts of phosphorus but have the same rate of change for independent variables. Last, the lakes have a common regression line if the regression relationships are the same irregardless of location. The ANCOVA was fit using SAS; we need to first determine which model will be applied to predict phosphorus levels. Table 2 has the corresponding sums of squared error (SSE) and degrees of freedom (df) to choose amongst the three models.

Separate / Parallel / Common
SSE / 1248700.8 / 1302609.6 / 1543852.5
Df / 28 / 31 / 32

Table 2. Sums of squared errors (SSE) and degrees of freedom (df) for the ANCOVA hypothesis tests.

Since there are three models that can be used for the ANCOVA, hypothesis tests were used to determine which model best fit the data. The first hypothesis test compares HO: common line vs. HA: parallel line. If the test statistic (F-value) is greater than the critical value (CV), then the common regression line is rejected.

F = (SSECommon-SSEParallel)/(dfcommon-dfParallel)= (1543852.5-1302609.6)/(32-31)

SSEParallel/dfParallel (1302609.6/31)

F = 5.74119

CV= F df1,df2 df1 = dfCommon-dfParallel = 32-31 = 1= F1,31

df2 = dfParallel = 31

CV= 4.16.

F>CV so we reject HO: common regression line using all three independent variables in favor of the parallel model.

If we had failed to reject the common regression line then the ANCOVA would be complete and the relationship between phosphorus and the three independent variables in West Lake and Railroad Lake would be the same, irregardless of location.

Since the common regression line was rejected, the next hypothesis test will compare HO: parallel lines with HA: separate lines where the new F-statistic and CV are computed as follows:

F = (SSEParallel-SSESeperate)/(dfParallel-dfSeperate) = (1302609.6-1248700.8)/(31-28)

SSESeperate/dfseparate(1248700.8/28)

F = .134312 and

CV= F df1,df2where df1 = dfParallel-dfseparate , df2 = dfseparate F3,28 = CV= 2.95.

F<CV so we fail to reject HO: parallel regression model.

Since we failed to reject the parallel model, the parallel model was best supported by the data. In the parallel model, the slopes for all the independent variables are the same for both West Lake and Railroad Lake, but the intercept will be different for each lake. Thus, the lakes have the same relationship between phosphorus and the independent variables, however Railroad Lake has higher phosphorus levels.

Next, we examine the exact results from the parallel regression ANCOVA model. Using SAS, the parallel regression ANCOVA produces different p-values (Pr>|t|) for the independent variables. If the p-value is significant (less than .05) then its corresponding variable is a useful predictor of the phosphorus level (see Table 3). Dissolved oxygen, water depth, and lake2 are all significant variables. The parameter estimates are needed to predict the phosphorus level.

The parameter for dissolved oxygen has an estimate of -73.5080548. This means every time dissolved oxygen increases one unit, then phosphorus will also decrease 73.5080548 units. Water depth has an estimate of 3.711403, whenever water depth increases one unit then phosphorus will also increase by 3.711403. The variable, lake2, is the last significant coefficient. The variable lake2 is binary; 1 indicates Railroad Lake while 0 indicates West Lake. Lake2 has an estimate of 521.1362683, which means Railroad Lake has a higher phosphorus level compared to West Lake.

Parameter / Estimate / Standard Error / t Value / Pr<|t|
Intercept / 4471.9217549 / 139.8634181 / 3.16 / 0.0035
lake2 / 521.1362683 / 217.4955036 / 2.40 / 0.0228
wdepth / 3.7114035 / 1.3586026 / 2.73 / 0.0103
sedepth / 1.7733186 / 1.6831966 / 1.05 / 0.3002
do / -73.5080548 / 32.0816166 / -2.29 / 0.0289

Table 3. Parallel regression ANCOVA model results

From Table 3 with lake2= 0, phosphorus levels in West Lake can be estimated using the equation:

West Lake:

Phosphorus = 441.9217+3.711402(wdepth)+1.7733186(sedepth)-73.50805(do) (1).

From Table 3 with lake2=1, phosphorus levels in Railroad Lake can also be measured with the equation:

Railroad Lake:

Phosphorus = 963.05802+3.711403(wdepth)+1.7733186(sedepth)-73.50805(do) (2).

Notice the slopes for water depth, sediment depth, and dissolved oxygen are all the same and only the intercept changes between the lakes. This is due to the choice of the parallel regression ANCOVA model.

Using the regression equation for West Lake, (1), we can also predict a value of phosphorus by selecting approximate values for the independent variables. Using the mean, or average, for all the independent variables in each respective lake (see Table 4) we will approximate phosphorus in West Lake and Railroad Lake by using (1) and (2).

West Lake / Railroad Lake
wdepth / 82.5 / 94.8
sedepth / 51.0 / 45.1
do / 3.8 / 10.4

Table 4. Mean value of independent variables for each lake.

West Lake using (1) and Table 4:

Phosphorus = 441.9217+3.711402(82.5)+1.7733186(51)-73.50805(3.8)

Phosphorus=559.221

Railroad Lake using (2) and Table 4:

Phosphorus = 963.05802+3.711403(94.8)+1.7733186(45.1)-73.50805(10.4)

Phosphorus =628.645

When the mean for water depth, sediment depth, and dissolved oxygen are used the predicted phosphorus level in West Lake is 559.221 mg/L and the predicted phosphorus level in Railroad Lake equals 628.645 mg/L. The two equations, (1) and (2), can be used to predict phosphorus levels for any other reasonable values of interest.

  1. DISCRIMINANT ANALYSIS FOR THE TWO LAKES

The goal of a discriminate analysis is to determine if the collection of all variables are different from West Lake compared to Railroad Lake during the 2002 summer. The analysis was performed in S-Plus using phosphorus, water depth, sediment depth, and dissolved oxygen. The first two discriminate functions were plotted against each other and a clear division between West Lake and Railroad Lake is visible (see Figure 7). Thus, the collection of all four variables is different from West Lake to Railroad Lake.


Figure 7. Discriminate functions using variables: water depth, sediment depth, dissolved oxygen, and phosphorus: W =West Lake, R =Railroad Lake.

5. REFERENCES

Carlson, E. and Ecker, M.D. (2002). "A Statistical Examination of Water Quality in Two Iowa Lakes". American Journal of Undergraduate Research 1(2). pp 31-45.

Oelmann, A. and Ecker, M.D. (2002). "A Statistical Examination of Sediment Phosphorus from Silver Lake and Lake Casey in 2002". Technical Report, Department of Mathematics, University of Northern Iowa.