File 1: Supplemental Description of Methods

Supplemental description to accompany Swenson et al. 2012. Plant and animal endemism in the eastern Andean slope: Challenges to conservation. BMC Ecology

File 1: Supplemental description of methods

Part 1. SPECIES DISTRIBUTION MODELING METHODS

Overview for all species

Adapted from:

Young, B.E. 2007. Study area. Pp. 8-12 in B. E. Young (editor), Endemic species distributions on the east slope of the Andes in Peru and Bolivia. NatureServe, Arlington, Virginia, USA

and

Hernandez, P.A. 2007. Distribution modeling methods. Pp. 13-17 in B.E. Young (editor), Endemic species distributions on the east slope of the Andes in Peru and Bolivia. NatureServe, Arlington, Virginia, USA

A. Selection of species

We defined the focal species for this study as those that are endemic to our study area. Because ecological boundaries are rarely as sharp as depicted on an ecoregion map, we maintained some flexibility in our criteria for inclusion. As an initial requirement for inclusion in our study, we chose all species with at least 90% of their known range in the study area that had been formally described by 2006. We then identified all species with ranges entirely within the analysis extent which included a buffer of 100 km around the study area. From the resulting list of species, we eliminated all species whose entire range was restricted to the buffer area and, therefore, did not occur in the study area sensu stricto. For the species occurring in both the buffer area and the study area, we eliminated all that were restricted to habitat types such as puna that did not occur in substantial amounts within the study area. Additionally, for species in humid forests on the northern and eastern boundaries of the study area, we eliminated those for which most known localities lie outside the study area.

In practice, for the three vertebrate groups we developed a Geographical Information System (GIS) algorithm to select species whose distributions met the inclusion criteria. Range maps in GIS format for these groups are available at NatureServe’s website: ( The algorithm compared these maps with the buffered study area to develop a list of candidate species. We refined this list by examining habitat affinities of species in borderline cases and by consulting taxonomic specialists to add recently-described species or to eliminate those with questionable taxonomic status. Selecting endemic plant species was more difficult because of the lack of comprehensive, geospatially explicit distribution data for any of the focal groups (see plant details below). We therefore relied on draft lists of national endemics and input from taxonomic specialists. For some cases, when we were unsure of the distribution of a species, we compiled localities from herbarium records and plotted them on a map of the study area. For species that occurred in both the study area and the buffer zone, we again relied on habitat information to determine whether to include the species.

For plants, we did not include those species for which current taxonomists recognize one or more infraspecific categories (e.g. subspecies, varieties), some of which are reported outside the boundaries of our project’s study areaFor this reason, we did not include species such as Cavendishia nobilis (Ericaceae) or Justicia kuntzei (Acanthaceae), among others, in our study. We also eliminated species for which the taxonomic status is unclear such that the known localities may refer to more than one biological species (e.g., the mouse opossum Marmosa quichua, family Didelphidae). We also had no choice but to eliminate valid species endemic to the study area for which we know of no discrete locality where the species is confirmed present. For example, the hummingbird Discosura letitiae (Trochilidae) is known from two localities in Bolivia, but the collections were made well before the era of providing precise location information on specimen tags. Without knowing more details about where this species was found, we cannot predict what its distribution might be. The resulting lists of focal species included 115 birds, 55 mammals, 177 amphibians, and 435 plants.

Because of the tremendous diversity of plants in the region combined with the variable knowledge of families and subgroups, we established the following criteria for inclusion in our study (Young 2007):

1. Taxonomic knowledge. must be well known at the species level in both Peru and Bolivia. We therefore restricted the list to groups with recent monographs describing the characteristics and distribution of each species. The publication of a monograph generally meant that the specimens of the group housed in major herbaria were fairly accurately identified. The knowledge needed to be even for species occurring in both countries to avoid biasing results toward either country. For this reason, for example, we did not treat orchids (Orchidaceae) because although they are well known in Bolivia (e.g., Müller et al. 2003), their taxonomy and distribution are less well understood in Peru.

2. General Distribution. Selected groups must show examples of endemism in the study area. For obvious reasons, we did not include families that have few or no species endemic to the study area.

3. Available distribution information. The groups we selected were known to have readily available locality data in herbaria or the literature for use in distribution models. Because of this factor we eliminated groups such as the cacti (Cactaceae) for which many species have been described based on vague locality information. In general, groups treated in recent monographs satisfied this condition.

4. Diversity of life forms. Among the candidate groups, we selected a suite that together represents the range of plant life forms in the study area, including herbs, vines, lianas, shrubs, treelets, and trees. The list includes species that root in the ground as well as those that live as epiphytes or hemiepiphytes.

5. Diversity of elevation. The suite of groups that we selected has endemic species that tend to occur across the range of elevations represented in the study area.

6. Diversity of habitats. The groups we chose also have species with habitat affinities that include all major habitats that occur in the study area. Thus the list includes groups that occur in the mountain forest of the Yungas, lowland moist forests, savannas, and dry valleys.

7. Economic uses. We included groups with species of economic value to help make the results more relevant for the general public and because species with economic uses can become threatened due to overexploitation.

Based on these criteria, we developed a plant list of twelve focal families plus three focal genera from two families to include in the study (Table 1). Besides not addressing obvious candidate families such as the Orchidaceae and Cactaceae as explained above, we did not focus on Pteridophyta other than the Cyatheaceae because a revision of the Bolivian species is not yet complete, Araceae because too many species have yet to be described, Amaryllidaceae because they have received relatively little attention from collectors in our study area, and the Aristolochiaceae because its center of distribution is south of the study area. In summary, the list includes 435 species (complete species list in Appendix S4) ; a variety of lifeforms, with peaks of diversity at different elevational ranges.

Table 1. Descriptive characteristics of the focal groups of endemic vascular plants chosen for species distribution modeling.

Group / Number of spp endemic to project area† / Life forms / Elevation where diversity peaks / Habitats
Acanthaceae / 157 / Herbs, shrubs / mid / Savannahs, Yungas, lowland forest
Anacardiaceae / 5 / Trees, shrubs / low / Lowland forest, dry valleys
Aquifoliaceae / 14 / Trees, shrubs / high / Yungas, lowland forest
Bruneliaceae / 10 / Trees / high / Yungas
Campanulaceae / 45 / Shrubs, vines / high / Yungas
Chrysobalanaceae / 13 / Trees / low / Lowland forest
Cyathaceae / 5 / Tree ferns / mid / Yungas
Ericaceae / 47 / Shrubs, vines, epiphytes / high / Yungas
Fabaceae – Inga / 16 / Trees, shrubs / low / Lowland forest
Fabaceae- Mimosa / 7 / Shrubs / mid / Dry valleys
Loasaceae / 19 / Herbs, shrubs / mid, high / Yungas
Malpighiaceae / 25 / Trees, vines / mid, high / Dry valleys, Yungas
Marcgraviaceae / 7 / Shrubs, hemiepiphytes / mid / Lowland forest
Onagraceae – Fuchsia / 33 / Shrubs, vines, treelets / high / Yungas
Passifloraceae / 32 / Lianas / high / Yungas

B. Georeferencing locality data

Many of the museum and herbarium specimens did not often have geographical coordinates recorded at the time of the sample. Using the recorded description of the sampling location, we used standardized georeferencing methods based on ancillary data. We followed the georeferencing guidelines established by the Mammal Networked Information System (2001), and used gazetteers (e.g. Stephens and Traylor 1983), 1:100,000 topographic maps produced by the Peruvian and Bolivian national cartographic institutes, and 1:250,000 digital databases (Programa Nacional de Informática y Comunicaciones de Naciones Unidas 1998) available for the study area.

C. Modeling methods

Environmental Data

We used environmental GIS layers describing climatic, topographic and vegetation cover conditions within our study area to develop species distribution models. These environmental data were sourced from four freely available data providers and developed further for our PDM purposes. Each layer was converted to the study’s geographic projection (a customized Lambert Azimuthal Equal Area), resampled to 1-km resolution (if provided at a finer resolution) and clipped to the study area buffered by 100 km, ensuring that geographic coordinates of the pixel boundaries were identical between layers. Even though a number of environmental datasets were available at a finer resolution, a 1-km pixel was selected for distribution modeling because the spatial precision of the species locality data in the majority of cases is low and therefore better matched to environmental data depicted at a coarser (i.e. 1 km) pixel resolution. To remove redundant information, we performed a correlation analysis to identify a subset of climatic variables that were not correlated with each other (correlation coefficient < 0.7) and also not correlated with elevation. This analysis was performed separately for the montane region (>800 m elevation) and the lowland region

The environmental layers obtained and/or derived from four data providers are described below. We were unable to use the Ecological Systems map in the species modeling as it was being created simultaneously. All preparations of these data layers were performed using ESRI ArcInfo Workstation (9.1) unless indicated differently.

Hole-filled seamless Shuttle Radar Topographic Mission (SRTM) 90 m digital elevation data Version 2. We derived three topographic layers from the STRM dataset. Data tiles covering the PDM study area were obtained from CGIAR ( version 3 currently available), merged into a single raster layer and resampled to a 1 km pixel resolution. We obtained slope data from this elevation layer by calculating the degree of slope (i.e. maximum rate of change in elevation from each pixel to its 8 neighbors) using the ArcInfo Workstation GRID command SLOPE. The third topographic layer called topographic exposure expresses the relative position of each pixel on a hillslope (e.g. valley bottom, toe slope, slope, and ridge). It is calculated by determining the difference between the mean elevation within a neighborhood of pixels and the center pixel. The difference is determined over a number of neighborhood windows and averaged in a hierarchical fashion (more weight given to the smallest window) to produce a standardized measure of topographic exposure. We calculated topographic exposure using an ArcInfo Workstation application by Zimmerman (2000) on the digital elevation data using three neighborhood windows of 3x3, 6x6 and 9x9.

Worldclim bioclimatic database ( Worldclim provides 19 summary climatic variables of precipitation and temperature for the 1950-2000 time period (Hijmans 2005). It is inadvisable to use all of these variables because colinearity in PDM predictor layers can have adverse effects on model performance. In an effort to identify and remove redundant information in our PDM environmental layer database we performed a correlation analysis to identify a subset of climatic variables that were not correlated with each other and also not correlated with elevation. This analysis was performed separately for the montane region (> 800 m) and lowland region of our study area to derive a list of uncorrelated variables for the two regions for PDM input.

Moderate Resolution Imaging Spectroradiometer (MODIS) 500m Global Vegetation Continuous Fields (Hansen et al. 2003, umd.edu/data/modis/vcf/data.shtml). We used the percent tree cover layer for South America, in geographic projection.

MODIS/Terra Vegetation Indices 16-Day L3 Global 1km (NASA EOS data gateway: http:// edcimswww.cr.usgs.gov/pub/imswelcome). We obtained data tiles covering the study area for the years 2001-2003. We chose the Enhanced Vegetation Index (EVI) instead of the traditional vegetation index NDVI (also available in this dataset) because EVI has proven to be less prone to saturation in high biomass humid forested areas (Huete et al. 2002) and therefore more sensitive to canopy variation than NDVI. The EVI data tiles were projected, merged, and exported to geotif images using the MODIS Reprojection Tool (3.2a, available at http://edcdaac.usgs.gov/landdaac/tools/modis/ index.asp) creating a single image for each 16-day time period. These EVI geotif images were entered into a standardized principle components analysis (PCA) utilizing a correlation matrix. We used the remote sensing software ENVI (4.2) for this analysis. PCA is a commonly used data reduction technique of multitemporal remotely sensed imagery (Hirosawa et al. 1996).We utilized the first two axes of the PCA for PDM, as they can be interpreted to represent vegetation structure and temporal dynamics respectively. Persistent cloud cover can complicate this sort of analysis, though using images that summarize data for a number of days helps alleviate the problem. Even 16-day periods can be affected by cloud cover, but areas with continuous cloud cover may have similar vegetative characteristics. We created six additional environmental predictor layers by summarizing the three MODIS data layers within moving windows of 2 km or 5 km using the ArcInfo Workstation GRID command FOCALMEAN. A spatial mismatch between the low precision of the species locality data and high precision of the MODIS satellite data may reduce the utility of the MODIS data products for predicting the distribution of our endemic species. Summarizing each MODIS layer within a spatial moving window was an attempt to compensate for this mismatch. Also, summarizing vegetation cover data in this way may be more ecologically relevant because factors influencing habitat selection are not restricted to the site of a species occurrence but also include the conditions of the surrounding landscape (Mazerolle and Villard 1999, Pearce et al. 2001, Johnson et al. 2002).

Maxent distribution modeling

The statistical mechanics approach Maxent was an obvious candidate because previous comparative studies demonstrated that it performs well even with small sample sizes (Hernandez et al. 2006, Elith et al. 2006, Phillips et al. 2006). Also the freely available application facilitates modeling many species at one time. To ensure that Maxent was best suited to modeling distributions of Andean species, we compared the success of Maxent and two new promising methods: Mahalanobis Typicalities (a method adopted from remote sensing analyses), and Random Forests (a model averaging approach to classification and regression trees) (resulting in the publication Hernandez et al 2008). We tested each method at predicting ranges of eight bird and eight mammal species using locality and environmental data gathered for our study. We found that Maxent performed very well, producing results that were more consistent across species with widely varying conditions (Hernandez et al., 2008). Results of this comparative analysis supported our decision to select Maxent as the inductive species distribution modeling method for our study. Inductive species distribution modeling models were developed using Maxent for all species with two or more unique localities. Maxent is based on a statistical mechanics approach called maximum entropy, meant for making predictions from incomplete information. It estimates the most uniform distribution (maximum entropy) across the study area given the constraint that the expected value of each environmental predictor variable under this estimated distribution matches its empirical average (average values for the set of presence-only occurrence data). Detailed descriptions of Maxent’s methods can be found in Phillips et al. (2004 and 2006). The algorithm is implemented in a stand-alone, freely available application (http://www.cs.princeton.edu/~schapire/maxent/). Maxent’s predictions are ‘cumulative values’, representing as a percentage the probability value for the current analysis pixel and all other pixels with equal or lower probability values. The pixel with a value of 100 is the most suitable, while pixels closer to 0 are the least suitable within the study area.

We considered only linear and quadratic features because of the low numbers of localities available for our study species and used default settings for Maxent. Maxent automatically chooses all non presence cells as background (assuming there are <10,000). Maxent then computes the distribution is then calculated over the union of background pixels and the presence points.