Detecting the multiple facets of biodiversity

(Supplementary Material)

Marta A. Jarzyna† and Walter Jetz

Affiliations:

Department of Ecology and Evolutionary Biology, Yale University

165 Prospect Street

New Haven, CT, USA

†Corresponding author:

SUPPLEMENTARY MATERIAL

Appendix S1.Simulation exercise to explore the influence of the tree shape on detectability.

We simulated 1,000 phylogenetic trees for five community size groups (i.e., N = 25, 50, 75, 100, and 125 species). For each of the phylogenetic trees, we calculated the Colless’ index of imbalance Ic[1] and “true” phylogenetic diversity (PD). To simulate imperfect detection of species, we pruned (100 times) each tree of 5, 10, 15, and 20% of species and calculated phylogenetic diversity for each pruning step (PDprune). PDprune was considered “observed” biodiversity. We calculated detectability of phylogenetic diversity (qPD) as the ratio of the “observed” (i.e., PDprune) to “true” (i.e., PD) phylogenetic diversity.

References:

1.Mooers, A.O. and S.B. Heard, Inferring Evolutionary Process from Phylogenetic Tree Shape. The Quarterly Review of Biology, 1997. 72(1): p. 31-54.

Appendix S2. Multispecies occupancy models addressing detection – a short primer.

Occupancy modeling is a statistical modeling framework that allows for accounting for imperfect detection of species or individuals. In order to discern a non-detection from an absence at each location, occupancy modeling techniques rely on a repeated sampling protocol [1]. In multispecies occupancy models, the history of detections and non-detections (denoted by 1s and 0s, respectively) collected for all species encountered during repeated samplings at each site (Fig. 1) are used to estimate—among others—true site-specific probabilities of species occurrence and probabilities of species detection[2, 3].

The multispecies hierarchical occupancy model has three levels: one for each species i, each site j, and each repeated sampling k.Observed data (i.e., detections and non-detections), yi,j,k, for species i = 1, 2, . . . , i at site j = 1, 2, . . . , j on sampling event k = 1, 2, . . . , k are modeled as resulting from the imperfect observation of a true occurrence state, zi,j, given a probability of detection, pi,j,k. This observation process is modeled as the Bernoulli random variable yi,j,k~ Bern(pi,j,k · zi,j), where zi,j = 1 if species i was truly present at site j, and zi,j = 0 if species i was absent at site j. The true occurrence state is specified as zi,j, ~ Bern(ψi,j), where ψi,jis the probability of occurrence by species i at site j. Site-specific probabilities of occurrence and detection for each species are traditionally modeled as linear combinations of survey- and site-specific covariates (Fig. 1) using either a frequentist (maximum likelihood estimation) or Bayesian framework. Data augmentation approach [4] is generally used in multispecies occupancy models to estimate taxonomic diversity (and by extension functional and phylogenetic diversity) present in an assemblage but not detected at any of the surveyed sites. See ref [5] for a detailed discussion of multispecies occupancy modeling and ref [6] for the application of multispecies occupancy modeling in abundance-based metrics.

While traditionally the probability of detection of each species is modeled as a function of site- and survey-specific covariates, our work suggests that species’ site-level probability of detection might be related to species’ local functional and phylogenetic distinctness. We thus suggest that future studies of functional and phylogenetic diversity should incorporate species’ local functional or phylogenetic distinctness within the model structure to inform the variation in the probability of detection of individual species and, consequently, in detectability of the different biodiversity attributes.

References:

1. Zipkin, E.F., A. DeWan, and J. Andrew Royle, Impacts of forest fragmentation on species richness: a hierarchical approach to community modelling. Journal of Applied Ecology, 2009. 46(4): p. 815-822.

2. Kéry, M., Towards the modelling of true species distributions. Journal of Biogeography, 2011. 38(4): p. 617-618.

3. Walls, S.C., J.H. Waddle, and R.M. Dorazio, Estimating occupancy dynamics in an anuran assemblage from Louisiana, USA. The Journal of Wildlife Management, 2011. 75(4): p. 751-761.

4. Royle, A.J., Analysis of Capture–Recapture Models with Individual Covariates Using Data Augmentation. Biometrics, 2009, 65(1): p. 267-274.

5.Iknayan, K.J., et al., Detecting diversity: emerging methods to estimate species diversity. Trends in Ecology & Evolution, 2014. 29(2): p. 97-106.

6. Dénes, F.V., L.F. Silveira, and S.R. Beissinger, Estimating abundance of unmarked animal populations: accounting for imperfect detection and other sources of zero inflation. Methods in Ecology and Evolution, 2015. 6(5): p. 543-556.

Figure S2.1.In order to discern a non-detection from an absence at each location, occupancy modeling techniques rely on a repeated sampling protocol (here, surveys 1 – 3). The inputs of multispecies occupancy model include detection histories from these repeated surveys for all species and all sites as well as site- and survey-specific covariates. The outputs include theestimates of site-specific true probability of each species occurrence and site-level estimates of probability of detection of each species.

Appendix S3. Description of data and statistical analysis used to provide an empirical test of the consequences of ignoring imperfect detection to functional and phylogenetic diversities.

Avian Distribution Data

We used two citizen science databases of avian occurrences: Swiss Breeding Bird Atlas (SBBA, and North American Breeding Bird Survey (BBS, The SBBA is the Swiss distribution atlas of breeding birds, where bird occurrences are mapped on 10x10 and 1x1 km squares distributed in a spatially representative fashion across Switzerland. No surveys were carried out above c. 2500 m, where hardly any birds breed. During the breeding season, three and two visits are carried out below and above the treeline, respectively. There are no time limits set for visits, but each one generally lasts between 2 and 4 hours. For our analysis, we used the 2007 data of 1 x 1 kmresolution.

The BBS is a long-term, large-scale avian monitoring program initiated in 1966 to track the status and trends of North American bird populations. The BBS data are collected annually during the height of the avian breeding season along over 4,100 survey routes located across the continental U.S. and Canada. Each survey route is approximately 40km long and contains five segments and 50 stops at approximately 800m intervals. At each stop, observers conduct a 3-minute point count during which every bird observed or heard within an approximately 400m-radius is recorded. The BBS follows a standardized monitoring protocol, allowing for sound comparison of avian diversity patterns through time. While the SBBA follows a repeated sampling protocol and thus lends itself nicely to occupancy modeling, BBS does not—each BBS survey route is visited only once during each breeding season. Therefore, the five segments falling within each BBS route were considered “repeated samplings” in order to account for imperfect detection of species recorded during the BBS. To minimize the spatial bias in our sample (most of the selected routes were located in Mid-Western and Eastern US) and produce a population that more evenly represents all US regions [1], we further conducted spatial subsampling using Bird Conservation Regions (BCRs). Within each BCR, we selected 20 routes, while maximizing the geographic distance between them. If a BCR contained less than 20 BBS routes, we retained all of them. Upon this pre-processing, 447BBS routes were available for further analysis. For our analysis, we used BBS data from 2013.

From both datasets, we removed those observation records that were not identified to a species level and records for species that are generally poorly surveyed; i.e., nocturnal and crepuscular species. Ultimately, we retained a total of 161 species for the SBBA data and 494 species for the BBS data.

Statistical Analysis

To estimate probabilities of species occurrence and account for imperfect detection of species, we developed multispecies hierarchical occupancy models [2, 3]. We chose multispecies occupancy models instead of single species occupancy models because multispecies models tend to improve inferences both at the species [4] and community levels [5]. For SBBA, observed data, yi,j,k, for species i = 1, 2, . . ., 161 at site j = 1, 2, . . ., 267 on sampling event k = 1, 2, 3 were modeled as resulting from the imperfect observation of a true occurrence state, zi,j, given a site-level probability of detection, pi,j,k. For BBS, i = 1, 2,. . ., 494, j = 1, 2, . . ., 447, and k = 1, 2, . . ., 5. The observation process was modeled as the Bernoulli random variable yi,j,k~ Bern(pi,j,k · zi,j), where zi,j = 1 if species i was truly present at site j, and zi,j = 0 if species i was absent at site j. The true occurrence state was specified as zi,j, ~ Bern(ψi,j), where ψi,j was the probability of occurrence by species i at site j.

Probabilities of detection and occurrence were modeled as linear combinations of survey- and site-specific covariates. For SBBA, site-level probability of detection was modeled as a linear function of the following covariates: Julian day of the survey, elevation, and forest cover. Probability of occurrence was modeled as a linear function of mean elevation, percent forest cover(including categories closed and open forest, shrubs, and small woodlands), percent cultivated land (lowland grassland, arable land, orchards, vineyards), percent alpine pastures (meadows and pastures at high-altitude, including some lower-lying areas on steep slopes), and percent developed land (urban habitats). Land cover data were extracted from the 1 ha grid provided by the Federal Statistical Office.For BBS, probability of occurrence was modeled as a linear function of elevation. Elevation was obtained from the National Elevation Dataset ( and calculated for each 1km block intersecting the BBS route and averaged across each route. Probability of detection was modeled as an intercept-only model because measurements of potential detection covariates (e.g., weather conditions) were not available for each segment of BBS routes. .

We estimated model parameters using Bayesian analysis. Bayesian parameter estimation was carried out using program JAGS (Just Another Gibbs Sampler; via R (version 3.1.2; using the package rjags ( Vague priors were used for means and variances of the community-level hyper-parameters. For each dataset, we ran three parallel chains of length 20,000, discarding the first 5,000 iterations as burn-in, and used a thinning rate of 10. Convergence was assessed by visual inspection of traceplots and by using the Gelman-Rubin convergence diagnostic [7]with all diagnostic values <1.1.

Taxonomic diversity and functional diversity were quantified as described in the main text. To calculate functional diversity, we used the data on function-relevant traits for bird species compiled in Wilman et al. [8]. Five trait categories were included: diet, body size, activity time, foraging niche, and broad habitat types. The dietary data contain proportions of different types of food—invertebrates, vertebrates, carrion, fresh fruits, nectar and pollen, seeds, and other plant materials—in species’ diet. The data on foraging niche reflect the proportional use of each of seven niches—in water below surface, in water on surface, terrestrial ground level, understory, mid canopy, upper canopy and aerial. The species were also categorized into two groups based on their broad habitat types (pelagic or not) and into two groups based on their activity time (diurnal and nocturnal/crepuscular), although all nocturnal and crepuscular species were removed from the analysis.

References:

1.Lawler, J.J. and R.J. O'Connor, How well do consistently monitored breeding bird survey routes represent the environment of the conterminous United States? The Condor, 2004. 106(4): p. 801-814.

2.Dorazio, R.M. and J.A. Royle, Estimating size and composition of biological communities by modeling the occurrence of species. Journal of the American Statistical Association, 2005. 100(470): p. 389-398.

3.Dorazio, R.M., et al., Estimating species richness and accumulation by modeling species occurrence and detectability. Ecology, 2006. 87(4): p. 842-854.

4.Zipkin, E.F., A. DeWan, and J. Andrew Royle, Impacts of forest fragmentation on species richness: a hierarchical approach to community modelling. Journal of Applied Ecology, 2009. 46(4): p. 815-822.

5.Russell, R.E., et al., Modeling the effects of environmental disturbance on wildlife communities: avian responses to prescribed fire. Ecological Applications, 2009. 19(5): p. 1253-1263.

6.Daly, C., et al., A knowledge-based approach to the statistical mapping of climate. Climate Research, 2002. 22(2): p. 99-113.

7.Gelman, A. and D.B. Rubin, Inference from Iterative Simulation Using Multiple Sequences. Statistical Science, 1992. 7(4): p. 457-472.

8.Wilman, H., et al., EltonTraits 1.0: Species-level foraging attributes of the world's birds and mammals. Ecology, 2014. 95(7): p. 2027-2027.

Appendix S4. Generating a null model expectation.

To identify whether the observed functional and phylogenetic diversity is expected given the number of species present at the site, we generated a null-model expectation for each site in the Swiss and North American data. For each grid cell assemblage, a null distribution was obtained by randomly reshuffling (100 times) the values of true probabilities of occurrence across the subset of species of the given assemblage, while keeping species richness constant.Results indicated that both Swiss and North American bird communities were functionally and phylogenetically clustered.

Figure S4.1. Null distributionswere obtained by randomly reshuffling (100 times) the values of true probabilities of occurrence across the subset of species of the given assemblage, while keeping species richness constant. The red dotted line is a line of intercept and slope equal to 0 and 1, respectively; points that fall on this line indicate that estimated functional or phylogenetic diversity is exactly as expected given a random process. Points above the line indicate that assemblages are more functionally or phylogenetically clustered than expected by chance; points below the line indicate that assemblages are more functionally or phylogenetically overdispersed than expected by chance. The size of the points indicate the standard error of the random model estimate.

Appendix S5. The relationship between species’ local functional and phylogenetic distinctness and their site-level probabilities of detection in Swiss and North American birds.

Table S5.1. Results of the mixed effects models of species’ local functional and phylogenetic distinctness. Distinctness was log-transformed to ensure Gaussian distribution. Explanatory variables included species’ site-level probability of detection (ProbDET), elevation (ELEV), and the interaction between species’ site-level probability of detection and elevation (ProbDET·ELEV). Explanatory variables were scaled to the mean or 0 and standard deviation of 1. The random effect was put on the site (i.e., a grid cell in Swiss BBA and a route in the BBS). The significance level of the coefficient estimate is indicated by *** and ** symbols.

Assemblage / Distinctness / Coefficient / Estimate / T value
Swiss / Functional / ProbDET / -0.035*** / -16.3
ELEV / -0.010*** / -4.9
ProbDET·ELEV / -0.016*** / -7.2
Phylogenetic / ProbDET / -0.067*** / -27.5
ELEV / -0.015*** / -6.4
ProbDET·ELEV / -0.015*** / -5.9
North American / Functional / ProbDET / -0.020*** / -12.9
ELEV / -0.027*** / -7.0
ProbDET·ELEV / -0.009*** / -5.1
Phylogenetic / ProbDET / -0.040*** / -17.8
ELEV / -0.030*** / -6.3
ProbDET·ELEV / -0.008** / -3.1

1.Mooers, A.O. and S.B. Heard, Inferring Evolutionary Process from Phylogenetic Tree Shape. The Quarterly Review of Biology, 1997. 72(1): p. 31-54.

2.Lawler, J.J. and R.J. O'Connor, How well do consistently monitored breeding bird survey routes represent the environment of the conterminous United States? The Condor, 2004. 106(4): p. 801-814.

3.Dorazio, R.M. and J.A. Royle, Estimating size and composition of biological communities by modeling the occurrence of species. Journal of the American Statistical Association, 2005. 100(470): p. 389-398.

4.Dorazio, R.M., et al., Estimating species richness and accumulation by modeling species occurrence and detectability. Ecology, 2006. 87(4): p. 842-854.

5.Zipkin, E.F., A. DeWan, and J. Andrew Royle, Impacts of forest fragmentation on species richness: a hierarchical approach to community modelling. Journal of Applied Ecology, 2009. 46(4): p. 815-822.

6.Russell, R.E., et al., Modeling the effects of environmental disturbance on wildlife communities: avian responses to prescribed fire. Ecological Applications, 2009. 19(5): p. 1253-1263.

7.Daly, C., et al., A knowledge-based approach to the statistical mapping of climate. Climate Research, 2002. 22(2): p. 99-113.

8.Gelman, A. and D.B. Rubin, Inference from Iterative Simulation Using Multiple Sequences. Statistical Science, 1992. 7(4): p. 457-472.

9.Wilman, H., et al., EltonTraits 1.0: Species-level foraging attributes of the world's birds and mammals. Ecology, 2014. 95(7): p. 2027-2027.

1