Appendix A

Simulation procedures to assess the performance of the beta-diversity and phylogenetic community composition approaches

The first set of simulations was based on a community matrix of 100 sites and 50 species, whereas the second set of simulations was based on a grassland plant community data set (Kembel and Cahill 2011) containing 76 species distributed across 27 communities in grasslands in Alberta, Canada. The data set also included five environmental variables: habitat type (mixedgrass or fescue grassland), slope, aspect, slope position, and moisture regime. All variables were continuous, with the exception of habitat type which was coded as either1 (fescue) or 2 (mixedgrass).

1 – Generate an environmental vector E (100 communities x 1) containing uniformly distributed random values between 0 and 100.

2 – Generate a phylogenetically structured P vector (50 species x 1) using a Brownian evolved “trait”. The trait was generated using the function fastBM in the R package phytools (Revell 2012). P was then transformed to vary between -1 and 101.

3 – Generate a vector h (50 species x 1) containing uniformly distributed random values between 0 and 30. These values represent the height (expected maximum abundance) of any given species at its optimum.

4 – Generate a vector  (50 species x 1) containing normally distributed random values with standard deviation 10 and a mean tolerance µtol. We considered two simulation scenarios in which µtol was set either as 5 or 10, the latter providing a greater tolerance and hence a weaker expected signal between species distributions, phylogeny and environment.

5 – Generate a unimodal response for the jth species at the ith site as follows:

The values in Lij were the transformed into Poisson deviates in order to generate a species distribution abundance matrix. Although the values in L represent abundances, we have in this paper considered only the case of presence-absence data and therefore values were transformed accordingly.

We have also considered the situation of two gradients. In this case, we generated two independent phylogenetically structured traits (P i1 and Pi2)and two independent environmentally structured environments (E i1 and Ei2) and L was generated as follows:

Note that the same tolerance was used for each species in both gradients. In order to assess the type I error and statistical power of the two frameworks, we considered four scenarios: 1 – both phylogeny and environment were unimportant in structuringL - Instead of using a phylogenetically structured trait, we used a P simply containing normally distributed values without any phylogenetic signal. After L was generated, another E vector also containing uniformly distributed random values between 0 and 100 was used in the gradient analyses instead. 2 – only environment but not phylogeny was important - Instead of using a phylogenetically structured trait, we generated L based on a P vector containing normally distributed values without any phylogenetic signal and the original E used to generate L was used in the gradient analyses. 3 – only phylogeny but not environment structured species distributions – L was generated using the original P vector but once L was generated, E was replaced by another randomly generated vector that was then used in the gradient analysis instead of the original one. 4– both phylogeny and environment structured species distributions – L was generated using the original P and E, which in turn were also used in the gradient analyses. For each scenario, we generated 1000 sample matrices with 1 or 2 gradients based on two values for µtol (5 or 10), giving a total of 16 000 simulations involving the three test procedures (row, column and row/column). Given that calculation of the phylogenetic beta-diversity matrices (100 x 100) was computationally intensive, especially given that they need to be recalculated at each permutation involving columns randomizations, we limited the number of permutations to 99.

Using the phylogenetic community composition approach (see results for the empirical data set), both the row and column-based permutation test showed a significant link between the grassland species distributions, phylogeny, and environmental affinities. Moreover, because both the row and column based approaches were significant, we knew that both the phylogeny and environment were related to the grassland distribution. Because the phylogeny was related to species distributions, we needed to condition species distributions on phylogenetic information that was capable of mimicking the scenarios usedin the first set of simulations: 1 – both phylogeny and environment were unimportant in structuringL - we created a vector containing normally distributed values without any phylogenetic signal and built a “phylogeny” (dendrogram) based on this vector that was then applied in both frameworks; a row permuted version of matrix E was used instead of the original matrix of environmental variable. 2 – only environment but not phylogeny was important – a random phylogeny was created in the same way as the first case, but the original matrix E was used in the gradient analysis. 3 – only phylogeny but not environment structured species distributions – we created a vector containing normally distributed values with a phylogenetic signal and built a “phylogeny” (dendrogram) based on this vector that was then used in the gradient analyses; a row permuted version of E was used instead of the original one. 4 – both phylogeny and environment structured species distributions – we used a “phylogeny” created in the same way as in scenario 3 and E was not manipulated. Two types of phylogenetically structured vectors were used in this latter scenario. One containing a weak signal based on a “trait” evolved under a Brownian motion model, and another containing a stronglyphylogenetically conserved trait. The conserved trait was generated by manipulating the phylogenetic tree according to Pagel’s (1999) delta transformation. By giving delta a value of 0.01, branch lengths were much shorter than the original values and using this tree allowed us to generate highly phylogenetically conserved traits. We conducted 1000 simulations for each scenario. Although the PCC approach is computationally much faster, we restricted the number of permutations to 99 so that type I error and power estimates (the number of rejections over 1000) were comparable across both frameworks.

1