Appendix 1: Methods for Generalized Additive Modeling Approach

We developed generalized additive models (GAMs) (Knapp, 2005; Miro and Ventura, 2013) to explore data relationships and determine the relative importance of factors on endemic shrimp occurrence. GAMs do not require constant variance or normally distributed errors, and therefore are useful when dependant variables are binary (ie. presence/absence data). GAMs relax the assumption that the relationship between the dependent variable (on the logit scale) and the explanatory variables are linear. Nonparametric loess smoothing functions are used in GAMs to describe the relationships between the dependent and continuous predictor variables (Hastie and Tibshirani, 1991).

The dependent variables in the GAMs were the presence or absence of H. rubra or M. lohena. The predictive variables that were examined for potential use in exploratory GAM models represented water properties, pool characteristics, land use, and co-occurring endemic and introduced species (Table A1). Chlorophyll a (n=201), turbidity (n=230), salinity (n=323) and temperature measurements (n=314) were not available for all of the pools included in the dataset so were examined in separate GAM models using a subset of pools. None of the water variables were a significant term for explaining H. rubra or M. lohena occurrence, and therefore, were dropped from the full analysis.

Determining the relative importance of factors controlling distributions of a plant or animal can be confounded by spatial autocorrelation because sites that are closer to each other may be more similar than those at greater distances (Koenig 1999, Legendre 1993). Spatial autocorrelation is problematic for statistical tests which require independence between variables. Incorporating a spatial autocorrelation term into regression models is recommended otherwise environmental variables may appear to influence distributions when they are actually not statistically significant (Legendre, 1993). In the GAM models, spatial autocorrelation was accounted for with a loess smoothed term that combined the latitude and longitude of each pool (Knapp et al., 2003, Dormann et al., 2007).

Collinearity between predictor variables in multiple regressions, may confound their independent effects (Quinn and Keough, 2002). Prior to regression analyses, Pearson correlation matrices were calculated for all pairwise combinations of independent variables with full datasets. Correlation coefficients (r) ranged between -0.32 and 0.71, were assumed to be independent, and were included in the dataset (Knapp et al., 2003).

The value pi is the probability of finding the shrimp at location i, and is defined as,

where the linear predictor( i.e. the logit line) θi is a function of the independent variables. For both shrimp species, the relationship we used for θ was:

where is the intercept or constant and ().. () are nonparametric loess smoothing functions that characterize the effect of each continuous independent variable on the probability of response. Spatial autocorrelation is incorporated as the term (UTME,UTMN) which is a smoothed surface of UTM easting and northing (Augustin et al. 1998). The terms TILAPIA, POECILIID, KSAND, MLAR, LU_PROXIMAL, LU_UPSLOPE are categorical binary variables and do not incorporate loess smoothing functions. For the models used to explain M. lohena distribution, and additional categorical variable HRUBRA was added to the model.

The best combination of independent variables was determined by dropping each term from the model in the presence of all other variables. Analysis of deviance and likelihood ratio tests were used to test the significance of each of the independent variables on the probability of each shrimp species occurrence (Knapp, 2005). Independent variables with significant (P ≤ 0.01) effects on shrimp occurrence were used to develop a simplified model that could be used to predict shrimp occurrence within anchialine pools.

References

Augustin, N. H., M. A. Muggelstone & S. T. Buckland, 1998. The role of simulation in modelling spatially correlated data.Environmetrics 9: 175–196

Dormann, C. F., J.M. McPherson, M.B. Araújo, R. Bivand, J. Bolliger, G. Carl, R.G. Davies, A. Hirzel, W. Jetz, W.D. Kissling, I. Kühn, , R. Ohlemüller, P.R. Peres-Neto, B. Reineking, B. Schröder, F.M. Schurr & R. Wilson, 2007. Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography 30: 609- 628.

Hastie, T. & R. Tibshirani, 1991. Generalized additive models.Chapman and Hall, New York, New York, USA.

Knapp, R.A., 2005. Effects of nonnative fish and habitat characteristics on lenticherpetofauna in Yosemite National Park, USA. Biological Conservation 121: 265-279.

Knapp, R.A., K.R. Matthews, H.K. Preisler & R. Jellison, 2003. Developing probabilistic models to predict amphibian site occupancy in a patchy landscape. Ecological Applications 13: 1069-1082.

Koenig, W.D., 1999. Spatial autocorrelation of ecological phenomena. Trends in Ecology and Evolution 14: 22-26.

Legendre, P., 1993. Spatial autocorrelation: trouble or new paradigm? Ecology 74:1659–1673.

Miro, A. & M. Ventura, 2013. Historical use, fishing management and lake characteristics explain the presence of non-native trout in Pyrenean lakes: Implications for conservation. Biological Conservation 167: 17-24.

Quinn, G. & M. Keough, 2009. Experimental Design and Data Analysis for Biologists. New York: Cambridge University Press.

Table A1: Explanatory factors in generalized additive models

Variable
Physical
AREA / Approximate surface area (m2)
SILT / % silt cover on substrate
Fauna
TILAPIA (binary) / Presence/ absence Tilapia
POECILIID (binary) / Presence/ absence poeciliids
MLAR (binary) / Presence/ absence prawn Macrobrachiumlar
KSAND (binary) / K. sandvicensis
Vegetation
CANOPY / % canopy overhang
PERIMETER veg / % of pool periphery surrounded by vegetation (within 0.5 m )
Land use
LU_UPSLOPE (binary) / Yes if urban, resort or residential development within 1km upslope and inland
LU_PROXIMAL (binary) / Yes if residential or resort development surrounds a pool
Spatial Autocorrelation
UTMN,UTMS / X and Y coordinates from GIS
Water quality (examined separately from other variables)
Temperature / Celsius
Salinity / ppt= parts per thousand
Turbidity / NTU = Nephelometric Turbidity Units
Chlorophyll a / μg/L

Table A2: Summary statistics for all anchialine pool survey results included in the study including mean, standard deviation, median, minimum value and maximum value. Vegetation and substrate variables represent percent cover data.

Pool Characteristic / Pools sampled / Mean / Stdev / Median / Min / Max
Turbidity (NTU) / 230 / 0.39 / 2.5 / 0 / 0.0 / 32.3
Chlorophyll a (ug/L) / 201 / 1.12 / 5.5 / 0.1 / 0.0 / 66.3
Temp (Celsius) / 314 / 23.3 / 2.4 / 23.1 / 19.3 / 29.6
Salinity (ppt) / 323 / 10.9 / 5.8 / 11.7 / 1.3 / 26.6
Area (m2) / 398 / 38.8 / 104.8 / 8.0 / 0.5 / 951.5
Distance to Shore (m) / 398 / 107.4 / 74.62 / 91.0 / 11 / 600
Maximum Depth (m) / 398 / 0.38 / 0.7 / 0.25 / 0 / 10
Tree Canopy (%) / 398 / 12.1 / 25 / 0 / 0 / 100
Emergent Vegetation (%) / 398 / 8.1 / 23.2 / 0 / 0 / 100
Perimeter Vegetation (%) / 398 / 28.7 / 38 / 5 / 0 / 100
Rocky Substrate (%) / 398 / 71 / 40.8 / 100 / 0 / 100
Silt Substrate (%) / 398 / 22.6 / 38.4 / 0 / 0 / 100
Sand Substrate (%) / 398 / 5.1 / 17.7 / 0 / 0 / 100

Figure A1: Salinity variation among anchialine pools and aquifers along the western and southern coastline of the Island of Hawaii. Each histogram shows the frequency of pool salinities within a geographic cluster.

Figure A2: Maps showing distribution and percentage of pools with: (a) vegetation, (b,c) endemic anchialine shrimp species, and (d,e,f) introduced species known to prey on endemic shrimp.

1