Pyron & Wiens – Electronic Supplementary Material
1. Phylogeny and divergence-time estimation
A detailed description of the primary phylogenetic analysis (maximum likelihood [ML] topology estimation) is given in our previous study [1]. The extensive concordance of the ML topology with previous estimates of amphibian phylogeny is discussed in that paper [1]. Discordance with previous estimates involves a few weakly supported branches, and the majority (64%) of nodes are well supported [1]. The final concatenated alignment consisted of up to 12712 bp for each of 2871 species (2,394 frogs, 436 salamanders, and 176 caecilians), including data from up to 12 genes (3 mitochondrial, 9 nuclear), 43.7% of the 6576 species in our database (5,817 frogs, 583 salamanders, and 41 caecilians). The full DNA-sequence matrix is available in DataDryad doi:10.5061/dryad.vd0m7.
In the present study, we estimated divergence times for this tree topology using a C++ implementation of r8s [2] called "sk8s," recently developed by S.A. Smith (pers. comm.). This implementation has been released under the new title “treePL” [3]. Both r8s and sk8s utilize the same penalized likelihood (PL) algorithm [2]. This algorithm estimates evolutionary rates and divergence dates on a tree given a set of fossil constraints and a smoothing factor determining the amount of among-branch rate heterogeneity. Following standard methodology, we determined the optimal smoothing factor empirically using cross-validation [2], with the root age fixed (see below) due to computational constraints. We tested six values for the smoothing parameter (0.01, 0.1, 1, 10, 100, and 1000), graduated by orders of magnitude across a reasonable range given empirical datasets [2]. The cross-validation analysis yielded an optimal smoothing-parameter value of 1. Subsequent analyses were run with fixed-age constraints on the nodes listed below, given the computational difficulties of dating trees of this size using minimum and maximum ages [4], and the existence of mostly concordant divergence-time estimates for amphibians that can be used as an existing framework for analysis [5]. This strategy incorporates a prior on the root [2, 6] and a number of other nodes, while estimating ages for other clades of interest.
Based on previous recommendations [7, 8], we placed a constraint on the Amniote-Amphibia divergence (the root of the tree) at 330.4 Ma (million years ago). This is based on the oldest known fossils of Lepospondyli,the sister group to either Lissamphibia or Reptiliomorpha [6, 9-11], and the youngest Whatcheriidae, a sister group to Tetrapoda[12]. The fossil age-estimate for this clade is broadly consistent with several recent estimates based on molecular clock analyses [6, 11, 13, 14]. To estimate the ages of internal nodes of interest, we fixed a series of nodes above the family level, using dates from a recent study of the origins of the major lissamphibian groups [5]. While other studies have looked at broad-scale age estimates in amphibians [11, 15-17], they lack wide taxonomic sampling. We confirmed the stratigraphic and paleontological fit of these dates with several recent reviews of lissamphibian origins [11, 18].
We attempted to identify nodes that spanned the temporal and taxonomic breadth of the higher-level structure of the tree, but that were not too close together (i.e. we did not choose to constrain all possible nodes, nor any direct ancestor-descendant pairs). In some cases, the shape of the trees necessitated constraining the stem-group age of the most recent common ancestor (MRCA) of some families in order to enforce the necessary constraint. However, all crown-group ages and most stem-group ages were freely estimated for the nodes of interest (i.e. families). We fixed the following internal nodes using estimated ages from ref. [5] based on the results of a PL analysis of a slow-evolving nuclear gene with multiple fossil calibration points:
i) Cryptobranchoidea
The MRCA of Hynobiidae and Cryptobranchidae: 164.50 Ma.
ii) Sirenoidea
The MRCA of Sirenidae and the non-cryptobranchoid caudates: 199.59 Ma.
iii) Salamandroidea
The MRCA of Salamandridae and Ambystomatidae: 167.22 Ma.
iv) Plethodontoidea
The MRCA of Rhyacotritonidae, Amphiumidae, and Plethodontidae: 133.03 Ma.
v) Leiopelmatoidea
The MRCA of Ascaphidae and Leiopelmatidae: 202.04 Ma.
vi) Pipoidea
The MRCA of Pipidae and Rhinophrynidae: 190.42 Ma.
vii) Discoglossoidea
The MRCA of Discoglossidae, Alytidae, and Bombinatoridae: 160.07 Ma.
viii) Pelobatoidea
The MRCA of Scaphiopodidae, Pelodytidae, Pelobatidae, and Megophryidae: 155.73 Ma.
ix) Ranoidea
The MRCA of Ranidae and Microhylidae et al.: 111.90 Ma.
x) Hyloidea
The MRCA of Bufonidae, Eleutherodactylidae, Hylidae et al.: 73.53 Ma.
xi) Gymnophiona
The MRCA of extant caecilians: 108.65 Ma.
Note that all tree-based analyses use only the single dated chronogram derived from the unique ML tree with estimated branch-lengths [1]. This tree has 64% of nodes with "strong" support (>70% bootstrap support), and alternative topological arrangements could conceivably affect our results. However, it is not computationally feasible at present to run these analyses (e.g. GeoSSE, QuaSSE) over a large sample of trees that are of the size that we use here. Furthermore a large sample of time-calibrated trees is not available, and generating one would be extremely difficult. For example, widely used Bayesian methods (e.g. [19]) for simultaneous estimation of divergence times and phylogeny would be too computationally intensive for the dataset used here (2,871 taxa). Furthermore, generating multiple time-calibrated trees from sk8s would also be challenging, since it would require re-estimating the maximum likelihood topology many times (bootstrap trees from RAxML include only topologies and molecular branch length estimates are necessary for divergence dating with penalized likelihood).
The insensitivity of our results to variation in topology and branch lengths is suggested by another analysis that utilized this tree and found strong concordance between estimates of phylogeny-related parameters (e.g. phylogenetic diversity) generated using our topology and alternative trees [20]. Also, a large-scale phylogenetic study similar to ours found that incorporating phylogenetic uncertainty into estimates of diversification rates did not show a strong effect of topological variance on estimated rates or on geographic patterns in their distribution [21].
Despite the limitations imposed by the large size of our tree, our estimates of topology and divergence times are broadly concordant with a large body of literature dealing with higher-level amphibian phylogeny [5, 17, 22, 23]. The stability of these estimates suggests that topological uncertainty (mostly towards the terminals) is unlikely to substantially affect the estimates presented. Notably, our estimates of divergence times are particularly concordant with another multi-locus study with extensive taxon sampling that used Bayesian method of divergence-time estimation [23], as well as others incorporating fossils directly [11].
2. Distributional and climatic data
Species range maps were used to obtain data on species richness and climatic niches. Range maps (polygons stored as ESRI shapefiles) were downloaded from the 2008 update of the IUCN Global Amphibian Assessment (GAA; accessed in January, 2009, for the 6119 species of amphibian covered by that database. These were matched against our taxonomic database, resulting in range maps for 93% (6117; omitting two taxa that are of ambiguous taxonomic status in recent classifications) of the 6576 species in our database, 5421 frogs, 546 salamanders, and 150 caecilians, ~87% of the ~7000 total currently described species listed by AmphibiaWeb [24].When necessary, disjunct range segments were dissolved into multi-part features to extract the centroid. Polygons representing introduced ranges or vagrants (as defined in the IUCN metadata) were removed from the dataset.
The ranges were then projected using the Cylindrical Equal Area projection, similar to previous studies [25]. We then extracted the mean, variance, and range for the selected climatic and environmental data using zonal statistics summaries (i.e. means of climatic variables within polygons). Zonal statistics were processed using Geospatial Modelling Environment v0.3.4b ( To avoid the omission of range-restricted species, we resampled the AET data at 2.5 min (~5 km) spatial resolution using nearest-neighbor interpolation to match the NPP and BIOCLIM data, so that smaller ranges would intersect fully with a point on the data layer. For some species still omitted at that resolution, we obtained data by extracting values for the polygon centroid.
A small number of species occur outside of the limits of the scale of the data projection, such as endemics on small islands, and had no data values for some of the variables. All 6117 species had data for the 19 BIOCLIM variables, whereas 333 (5%) were missing entries for AET and 30 (0.5%) lacked information for NPP. A total of 356 species (6%) were thus missing data for 2 of the 21 variables, 223 of which were represented in the phylogeny (8% of the tips), for a total of 0.2% missing data cells among the species in the tree. Rather than remove these species due to this small amount of missing data, we imputed the absent values using the maximum-likelihood Expectation-Maximization (EM) method [26], which has been shown to perform well under empirical conditions, without inflating variance or Type I error[27]. Thus, we have 100% data coverage for 6117 species (~90% of ~7000 total species), for 21 climatic and environmental variables (AET, NPP, and 19 BIOCLIM variables). Of the 2871 species in the phylogeny, 2794 had ecological data (missing species lacked range maps entirely in the GAA assessment). Thus, all phylogenetic analyses of ecological traits used a pruned tree, representing only those 2794 species with climatic data.
This analysis has some potential drawbacks. Primarily, we are integrating across range maps without considering variation in presence or absence within the bounded range. For example, a species that occurs in several valleys between mountains may have a polygon that crosses the mountain tops, and thus our data would include climatic data from higher elevations where the species does not occur, potentially biasing estimates for that species. However, there are two aspects of this dataset that should help to alleviate this problem. First, the range maps were individually drawn by taxonomic experts, and should thus include a minimum of extralimital areas. Second, the analyses performed using these data (e.g. QuaSSE) incorporate uncertainty in trait measurements (see below), and can thus partially account for these errors.
We then performed Principal Components Analysis (PCA) on the summary values for each species to reduce the 21 environmental variables (AET, NPP, and the 19 climatic variables), many of which are expected to be strongly correlated [28]. Given that traditional PCA does not account for the phylogenetic non-independence of species, we used the phylogenetically corrected PCA method [29], calculated with the dated chronogram described below. Code for performing these calculations in R is available from RAP. We extracted the first PC axis (PC1) for use in further analyses (see below). The first PC axis explains 33% of the variation in these variables (table S1). The variables with the strongest negative loadings (<-0.90) are BIO1 (Mean Annual Temperature; -0.96), BIO6 (Minimum Temperature of Coldest Month; -0.97), BIO9 (Mean Temperature of Driest Quarter; -0.91), and BIO11 (Mean Temperature of Coldest Quarter; -0.95). Positive variable loadings were weaker, with BIO7 (Temperature Annual Range; 0.54) the only factor loading greater than 0.5 (table S2).
In future analyses, including more PC axes might be beneficial. However, we only used PC1 for several reasons. The first is computational complexity. Each QuaSSE analysis required several weeks to complete, and this would have been difficult for multiple variables. Ideally, an analysis of multiple variables would need to be conducted simultaneously, so that estimated rates reflected response curves from all variables. This is theoretically possible using QuaSSE [30], but has not been implemented in any currently available packages [31]. More importantly, PC1 strongly reflects temperature variables that have previously been shown to be important correlates of amphibian diversity [32]. Most importantly, the QuaSSE results are strongly significant (see below). Thus, including additional PC axes will not overturn our results; PC1 shows a significant relationship with speciation and extinction rates, regardless of whether or not other PC axes are significant (or in which direction). The other major axes (PC2–4) are also correlated with latitude (Pearson’s correlation coefficient: rP = 0.25, -0.54, 0.75, all p0.00001; respectively), and will thus presumably support the same pattern of temperate to tropical imbalances in speciation and extinction.
3. Biogeographic regions
To allow reconstruction of the timing of colonization and length of occupancy of the various temperate and tropical ecoregions, we first assigned all 6576 species in our database (including the 2871 species in the tree) to one or more ecoregions in a global set of 12 biogeographic provinces, using the GAA range maps. The 12 ecoregions follow from commonly used definitions in herpetology and biogeography [33–36]. While some ambiguity about the limits of these ecoregions may exist, they correspond closely with both geography and species distributions, and represent major areas of amphibian diversity and endemism [33]. They are:
Tropical South America: Tropical regions of South America, ranging from the Colombia-Panama border to the latitudinal level of Buenos Aires, excluding the high-elevation southern Altiplano (see below under Temperate South America). We defined the boundary between Middle and South America as the Panama/Colombia border.
Tropical Middle America: Including Central America, Baja California Sur, and tropical regions of Mexico, including Sonora on the Pacific coast, and Tamaulipas on the Gulf coast. Temperate South America: Southern South America, including Patagonia, the Pampas of Argentina and Uruguay, and the high-elevation Altiplano extending into Bolivia and southeastern Peru.
West Indies: The Caribbean Islands, including the Antilles and the Bahamas. Does not include coastal islands such as Trinidad, Bonaire, Curacao, Aruba, or Cozumel.
Nearctic: Temperate North America, including the continental United States, Canada, the central Mexican Plateau, and the Mexican state of Baja California.
Afrotropical: Sub-Saharan Africa and the southern Arabian Peninsula (i.e. Yemen and southern Oman).
Western Palearctic: Europe (i.e. west of the Caspian Sea), North Africa (Mauritania to Egypt), the northern portion of the Arabian Peninsula (excluding Yemen and southern Oman), and northwestern Iran.
Eastern Palearctic: Temperate Asia east of the Caspian Sea, excluding the tropical provinces of southern China (Yunnan, Guangxi, southern Sichuan, Guangdong, Hainan, Hong Kong, Macau, and Fujian) and including the major islands of Japan (but not the southern Ryukyu Islands). Includes Afghanistan and southeastern Iran.
Madagascar: Including Madagascar and adjacent islands (i.e. Mauritius, the Seychelles, and the Comoros).
Australasia: includes Australia, New Zealand, New Guinea, and islands to the east (e.g. Solomon Islands, Fiji, Vanuatu, New Caledonia). Includes the Maluku Islands. Separated from Southeast Asia by Weber's Line [37].
Southeast Asia: Tropical East Asia, from Myanmar to the Lesser Sunda islands, including southeast China (Yunnan, Guangxi, Guangdong, Hainan, Hong Kong, Macau, and Fujian provinces), Taiwan, the southern Ryukyu Islands, and the Philippines. Separated from Oceania by Weber's Line.
South Asia: The Indian subcontinent, from Pakistan to Bangladesh, including Sri Lanka, Nepal, and Bhutan.
Note that species-accumulation curves are still nearly vertical in many of these areas (e.g. the Amazon basin, New Guinea; [33, 38]), and new species are still being discovered even in relatively well-known areas such as the eastern United States [39]. While absolute diversity is likely underestimated in every ecoregion, the proportional differences (i.e. the latitudinal gradient) should be reflected in the existing species counts. Undescribed and cryptic diversity could also conceivably affect our analyses of speciation, extinction, and dispersal rates (see below). However, two aspects of our analyses alleviate these concerns. First, our rate estimates are based on phylogenetic branch-length information, not species counts alone, and should thus be relatively robust to missing species. Second, our results show higher speciation rates in tropical clades even though these clades are more poorly sampled (see below) and these differences in rates would only be magnified by including a large number of undescribed tropical species (given the relatively safe assumption that most new species will belong to existing families).
4. Mid-domain effect
As a first test of potential explanations for the latitudinal gradient in diversity, we tested for the mid-domain effect (MDE), a putatively non-biological explanation for richness gradients. The MDE is the tendency for multiple overlapping ranges to result in a latitudinal gradient, due solely to random shuffling of ranges within a bounded geometric domain [40, 41]. Note that we are not interested in addressing the MDE as a neutral model for explaining species assemblages in a macroecological context.Instead, we simply confirm that the latitudinal diversity gradient in amphibians is statistically significant and presumably has a biological origin (i.e. related to rates of speciation, extinction, and dispersal) rather than being explained solely by random range shuffling. However, we think that the latitudinal gradient might still have a biological origin, even if we were unable to reject the MDE hypothesis.