Electronic Supplementary Material

Species Distribution Model Landscape

For model fitting, we used the maximum and minimum occurrence coordinates to define a bounding box from which the background was selected (cf Fig. 1c). This approach balanced our desire to fully capture the predicted species’ range based on climatic suitability while constraining model fitting to observed environments (VanDerWal et al. 2009, Merow et al. 2013). The fitting landscape was comprised of 6912 cells from 39.02321° to 43.02321N° latitude and -69.00646° to -81.00645W° longitude. The projection landscape included 20,034 cells that covered terrestrial US habitats from 36.52321° to 47.43987N° latitude and -67.00646° to -84.83979W° longitude. Predictor variables (climate metrics, oak host basal area, human population density, and road density) were summarized to the same 5-minute grid for all models.

Correcting for Sampling Bias in Species Distribution Models

Sampling bias is a known confounding factor in species distribution models, especially those that rely on presence-only data (Philips et al 2009, Elith et al. 2011, Merow et al. 2013). Samples were clustered in geographic space and we accounted for this spatial and environmental bias in two ways. First, we constructed a sampling bias model that represented ease of access to sites for sampling, which we call our “bias model” (Merow et al. 2013, Merow et al. 2016). We did not have occurrence data from a larger target group to estimate sampling bias (Phillips et al. 2009), so we used human population and road density to account for anthropogenic correlates of sampling bias (Merow et al. 2013, Merow et al. 2016). Our expectation was that sites with greater accessibility would be sampled with greater frequency (Reddy & Dávalos2003, Phillips et al. 2009). The anthropogenic data were obtained from the 2010 US Census and 2008 TIGER line files (US Census Bureau) and summarized to the same 5-minute scale used for the climate data.

All occurrence records (n=362) were retained for bias modeling to reflect the density of sampling. The bias model was then used as “bias layer” input for the climate-based SDM (Phillips and Dudik 2008). Second, we used the “remove duplicates” option in MaxEnt when fitting the SDM. This option scales the occurrence data to the climate data by keeping a single observation in each 5-minute climate grid cell. Removing duplicates helps to avoid over-estimating the climate suitability of a particular area simply due to intense sampling. Our 362 point records yielded 100 unique occurrence grid cells once occurrence data was aggregated to the scale of the covariate data.

Model Validation

We used 10-fold cross-validation. For each of the 10 model runs (or “folds”), the dataset was randomly divided into 90% training data (used to fit the model) and 10% test data (used to validate the model). The percentage of data used to train and test the model is determined by the number of occurrence points and the selected number of folds, such that the occurrence points are divided equally among each fold (100 occurrence points/10 folds = 10 test points per fold). The output of each of the 10 runs were averaged to create the final model result. In each of the 10 iterations of the species distribution model, our training dataset included 90 occurrences and our test dataset included 10 occurrences.

Choosing Climate Predictors

We began with six candidate climate predictors chosen from available data to represent average temperature and precipitation, variability in temperature and precipitation, and extremes (Ibanez et al. 2009). We ran an initial model with all candidate climate predictors following the methods from the main text and assessed the percent contribution and permutation importance (Phillips et al. 2006) in conjunction with correlation analysis in R 3.3.1 (R Core Team 2016). We removed predictors with low contribution and importance values that were highly correlated with the variables with the highest contribution and importance values. The contribution and importance values of mean annual precipitation, mean diurnal temperature range, mean annual temperature, and the maximum temperature of the warmest month were low (Table S1) and each variable was highly correlated with one or both of the most important predictors in the model (precipitation seasonality and minimum temperature of the coldest month). The final model therefore includes only precipitation seasonality and minimum temperature of the coldest month as predictors along with host plant basal area.

Alternative Model: Restricted Oak Hosts

The oak host species of S. f. ontario remain disputed, so we explored two reasonable alternative sets of host species in our model fitting. A model fit using a layer summarizing the basal area of all known possible oak host species are included in the main text. Here we report the model results for a restricted set of oak host species, specifically the basal area of Quercus alba (white oak) and Quercus stellata (post oak). The spatial pattern of predicted average relative suitability is qualitatively the same for the restricted set of oak host species (Fig. S2 compared to Fig. 4), with the exception of small pockets of low relative suitability apparent in the south-central portion of the prediction area that disappear with the restricted set of oaks hosts. The uncertainty estimates are also very similar (Fig. S3 compared to Fig. 5), with a few higher uncertainty pixels in the south-central prediction area that emerge in the restricted oak host model.

Table S1: Percent contribution and importance of each climate variable to species distribution model for Satyrium favonius ontario

Variable / Percent Contribution / Permutation Importance
Precipitation Seasonality / 48.0 / 50.6
Minimum Temperature of the Coldest Month / 37.0 / 27.3
Mean Annual Precipitation / 8.4 / 0.2
Mean Diurnal Temperature Range / 3.7 / 4.4
Mean Annual Temperature / 1.5 / 17.2
Maximum Temperature of the Warmest Month / 1.4 / 0.3

Fig. S1: Imputed oak host species basal area across the study region. Black points are known occurrence points of S. f. ontario at the 5-minute scale.

Fig. S2: Average predicted relative suitability for the S. f. ontario across the study region for ten replicate model runs with oak species restricted to Quercus alba and Quercus stellata. “Warmer colors” indicate higher suitability and black points are known occurrence points at the 5-minute scale.

Fig. S3: Uncertainty in predicted relative climatic suitability for S. f. ontario across the study region for ten replicate model runs with oak species restricted to Quercus alba and Quercus stellata. “Warmer colors” indicate higher uncertainty and black points are known occurrence points at the 5-minute scale.

Appendix References

Elith J, Phillips SJ, Hastie T, Dudík M, Chee YE, Yates CJ (2011) A statistical

explanation of MaxEnt for ecologists. Divers Distrib 17:43-57

Ibanez I, Silander Jr. JA, Wilson AM, LaFleur N, Tanaka N, Tsuyama I (2009) Multivariate forecasts of potential distributions of invasive plant species. Ecol App 19:359—375

Merow C, Smith MJ, Silander Jr. JA (2013) A practical guide to MaxEnt for modeling

species’ distributions: what it does, and why inputs and settings matter. Ecography

36:1058-1069

Merow C, Allen JM, Aiello-Lammens M, Silander Jr. JA (2016)Improving niche and range estimates with Maxent and point process models by integrating spatially explicit information. Global Ecol Biogeogr 25:1022-1036

Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions. Ecol Model 190:231-259

Phillips SJ, Dudík M (2008) Modeling of species distributions with MaxEnt: new

extensions and a comprehensive evaluation. Ecography 31:161-175

Phillips SJ, Dudík M, Elith J, Graham CH, Lehmann A, Leathwick J, Ferrier S (2009) Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecol Appl 19:181-197

R Core Team (2016) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

Reddy S,Dávalos LM (2003) Geographical sampling bias and its implications for conservation priorities in Africa.J Biogeogr30:1719–1727

VanDerWal J, Shoo LP, Graham C, Williams SE (2009) Selecting pseudo absence data for presence-only distribution modeling: how far should you stray from what you know? Ecol Model 220:589-594