Ecological interactions are evolutionarily conserved acrossthe entire tree of life

José M. Gómez1, Miguel Verdú2 and Francisco Perfectti3

1Dpto de Ecología, University of Granada, Granada, SPAIN

2Centro de Investigaciones sobre Desertificación (CSIC-UV-GV), Valencia, SPAIN

3Dpto de Genética, University of Granada, Granada, SPAIN

Ecological interactions are crucial to understand both the ecology and evolution of organisms1-2.Since the phenotypic traits regulating species interactions are largely a legacy oftheir ancestors,it is widely assumed that ecological interactions arephylogenetically conserved, withclosely related species interacting with similar partners2. However, the existing empirical evidence is inadequate to appropriately evaluate the hypothesis of phylogenetic conservatism in ecological interactions because it is both ecologically and taxonomically biased. In fact, most studies on the evolution of ecological interactions have focused on specialized organisms, such as some parasites or insect herbivores3-7, belonging to a limited subset of the overall tree of life.Here westudy the evolution of host use ina wide and diverse group of interactions comprising both specialist and generalist acellular, unicellular and multicellular organisms. We show that generalized interactions, as previously found for specialized ones, can also be evolutionarily conserved.Significant phylogenetic conservatism of interaction patterns was equally likely to occur in symbiotic and non-symbiotic interactions, as well as in mutualistic and antagonistic interactions.Host use differentiation amongspecies was higher in phylogenetically-conserved clades, irrespective of their generalization degree and taxonomic position within the tree of life.Our findings strongly suggest a shared pattern in the organization of biological systems through evolutionary time, mediated by marked conservatism of ecological interactions among taxa.

Shared ancestry may produce ecological similarity, with closely related species showing similar ecologicalniches8. The idea, rather than being new, may be traced back to Darwin’s famous statement that struggle for existence is most severe among related species because they have similar phenotypes and niche requirements9. Interspecific interactions comprise a substantial part of the niche of most species10. Conventional wisdom suggests that two closely related species should be more likely to interact with similar organisms than would species that are remotely-related, because the phenotypic traits that regulate the interactions are often phylogenetically conserved3-7. It is thereby widely assumed that, as with other niche components, ecological interactions are evolutionarily conserved2,7.

We explored this idea compiling information from 116 clades belonging to 7 Kingdoms (Euryarchaeota, Bacteria, Excavata, Chromalveolata, Fungi, Plantae and Animalia) from the three cellular Domains (Archaea, Eubacteria and Eucarya) andRNA and DNA viruses (Supplementary Appendix 1).We chose these systems because (1) they contain all types of ecological interactions, from antagonism (i.e., endophytic herbivory, folivory, parasitism) to mutualism (i.e., pollination, mycorrhiza, seed dispersal, N-fixing); (2) by exploring organisms from disparate portions of the tree of life, our data set avoids taxonomic and systematic biases; (3) they comprise a wide range of generalization/specialization degree; (4) reliable records of interacting organisms (hereafter,referred to ashosts for the sake of simplicity) are available for those clades; and (5) phylogenetic trees are available (Online Supplementary Appendix 1). We have used genus as our target clade level because it is the taxonomic level at which interaction-mediated speciation mostly manifest2,7. Similarly, the taxonomic resolution of the hosts was determined by the information available for each studied system (see Online Full Methods). However to check whether this may represent a source of bias in our study, we included this information as covariate in all subsequent analyses.

The host range ofthe studied clades, calculated as the average number of organismsinteracting with each species of that clade,varied from 1 (extreme specialization) to 11.2. That is, we worked with systems ranging from thosein which all species interacted with only a single host to thosein which the species interacted on average with more than 10 hosts. Nevertheless, within most systems there were species interacting with many hosts (up to 50) coexisting with species interacting with very few hosts (Supplementary Appendix 1). We categorizedthe studied clades as specialists, having a host range 1.5 hosts/species, and generalists, those having host ranges≥ 1.5 host/species. 58% (N= 67) of the studied clades were specialist and 42% (N=49) were generalist. Specialization depended significantly on taxonomic affiliation (Table S1). As many as 95% of the viruses were specialist (host range: 1.30± 0.29 host/species, N = 21 clades), whereas only 53% of the eukaryotes (host range: 2.55 ± 0.25; N = 76 clades) and 48%of the prokaryotes (host range: 1.87 ± 0.30, N = 19 clades, pooling Archaea and Eubacteria) were specialist. No other system characteristic (interaction intimacy, interaction sign, number of species per genus, number of species actually studied per genus or sampling effort) affected specialization degree (Table S1), suggesting that our distribution of host range across genera was not biased by thesampling intensity of the original data set.

Tracking the evolutionary history of specialized interactions is not difficult, and has been performed for different kinds of interaction, from mutualism (i.e., pollination, dispersal) to antagonism (i.e., herbivory, parasitism).It can be explored bydirectly mapping host shift across the phylogeny of the focal clade. Because in extremely specialized clades the host range is very narrow, it is easy to draw the identity of the host for each species in the phylogeny and to quantify host shifts and host conservatism. Specialized interactions are conserved when there is nonindependence in host use among species within a clade due to their phylogenetic relatedness. This can be tested by estimating the degree ofphylogenetic signal, which is the tendency for related species to resembleeach other in interaction patterns morethan species randomly drawn from the phylogenetic tree do11.However, the use of this approach becomes increasingly difficult, to the point of becoming unfeasible, as the diversity of organisms interacting with the focal clade increases. As stated above, more than 50% of the studied clades were generalist, showing host ranges higher than 1.5 per species.

Generalist species interact with many other species, and therefore form networks of interacting organisms. Network analysis has been successfully and extensively used during the last decade to analyze complex ecological interactions12-13. Based on the pattern of shared interactions, species can be grouped in compartments or modules. Species are tightly linked if they share a high proportion of interactions, that is, if they are ecologically similar.Groups of species interacting with similar organisms would form modules within the general network14-15. Significant modularityemerges in a network when distinct groups of species closely sharelinkswith each other more than with species in other modules16. Using a network approach, we explore the evolution of ecological interactions by tracking the changes in module affiliation across the phylogenies (Fig. 1). Several recent studies have quantified the effect of phylogenetic structure on the dynamics of ecological communities17-18. However, rather than following the standard community perspective to build networks, we use clade-oriented networks (i.e., groupsof phylogenetically-related speciessharing a common ancestor but not necessarily co-occurring in the same locality). For extremely specialized clades, those showing high host specificity, the modules contain the group of species interacting with the same host (Fig. 1a). Exploring phylogenetic conservatism in module ascription is analogous in these types of systems to exploring phylogenetic conservatism in host use using the standard methodology. In clades in which species interact with more than one host, establishing out modules of species sharing similar hostsallows for the exploration of phylogenetic conservatism even when it is not possible to group species according to their exact equivalence in host use (Fig 1b). Under this perspective, ecological interactions are conserved if aphylogenetic signal occurs in module affiliation. In short, this method allows the exploration of evolutionary conservatism of ecological interactions in all types of systems, irrespective of their degree of generalization or host specificity.

For each clade we built a network as described above (Fig. 1), including as nodes only the species for which host use is accurately known. In these networks, species were linked when they shared at least one host. As interacting organisms, we used the most accurate taxonomic resolution used in the original studies (Supplementary Information). We then used simulated annealing to establish significant modules within each network14-15.A modular network implies that the clade is made up of species that can be grouped according to their affinity in host use. All but sixclades(Echium, Chlamydia, Iochroma, Mycobacterium, Onthophagus, Terfezia) were significantly modular, indicating that species can be grouped within a given clade based on their similarity in host use, and this modularity is evident in generalized as well as in specializedclades (Table S2). It is easy to identify the hosts associated witheach module in specializedclades (Fig S1). In contrast, in generalized clades modules are defined by the shared use of several hosts (Fig. S1). To overcome this problem, we quantified between-module differences in host composition by using a multivariate ANOVA based on species-composition dissimilarity (ADONIS19), a standard approach in community ecology. We found that the modules of all studied clades, both specialist and generalist, differed significantly in the identity and composition of the host assemblage (p<0.01 for all systems, Table S3). More interestingly, modules within a network did not differ among themselvesin the number of hosts (p>0.05 for most systems, after Bonferroni correction; Table S3), meaning thatthe emergence of modules in generalized clades was not due to differences in host range across species. Only eleven out of the 116 studied systems (Arceuthobium, Asclepias, Curculio, Encarsia, Ficus, Geosmithia, Gonioctena, Morbillivirus,Mycobacterium,Rickettsia,andTetratrichomonas) showed significant differences in host range among modules (Table S3).All together, these results show that modules describe distinctand discrete interactiveniches both in specialized and generalized clades.

The number of modules in a given network may be considered a measured of the diversity ofinteractiveniches occupied by the genus.The number of modules per network ranged between 2 and 20 (5.52 ± 0.36). It was affected by the interaction intimacy (symbiotic vs. non-symbiotic systems), with symbiotic genera having more modules (6.2 ± 0.5, n =75 clades) than non-symbiotic ones (4.2± 0.4, n = 40 clades). The number of modules was negatively related to host range, even after controlling for number of species in the clade (Table S4). This means that the number of distinctinteracting niches is higher in specialized genera than in generalized ones. In addition, Domain also affected the number of modules per system (Table S4), with viruses and prokaryotes having more modules per clade than eukaryotes (Table S5). This may reflect a trend towards greater diversification of ecological niches in microorganisms.

Simulated annealing produces a modularity index “M”that estimates how clearly delimited the modules are15 (see Methods).This index depends on the number of between-module links relative to the number of within-module links, and decreases when the fraction of between-module links increases in the total network. In the context of ecological interactions, this means thatM is negativelyrelated to the proportion of species belonging to different modulesbut sharing hosts. Consequently, it can be used as an estimate of thebetween-moduledifferentiation in host use. Low values of M wouldindicate no differentiation because many hosts are shared between different modules and high values of M would indicate differentiation because the modules do not use common hosts.The extreme situation is exemplified by those genera where most modules are completely isolated without any links with the remaining modules (high differentiation in host use; see Figure S1).Modularity ranged in our data setbetween 0 and 0.833 (Appendix S1). M was higher in specialist clades (0.390 ± 0.026) than in generalist ones (0.232 ± 0.030). There was indeed a significant negative relationship between modularity and host range across clades (Table S4). This means that between-module differentiation in host use decreases with the generalization of the clades. In fact, in community networks modularity is expected to increase with host specificity16. Similarly, M was higher in symbiotic (0.363 ± 0.030) than in non-symbiotic clades (0.239 ± 0.034; Table S3), probably because symbionts tend to be more specialized (host range= 1.77 ± 0.17) than non-symbionts (host range = 2.95 ± 0.40) (F = 9.63, df = 1,108, p = 0.002), and share fewer hosts between modules.This association between interaction intimacy, generalization degree and network metrics has been previously shown for 19 ant-plant mutualistic networks20.

To explore how evolutionarily conserved ecological interactionsare, we statistically tested whether phylogenetically-related species were more prone to belong to the same module than expected randomly (i.e., we tested for phylogenetic signal of ecological interactions)21. We found thatover 83% of the specialist clades showed significant phylogenetic signal for host use (Appendix S1). More important, 52% of the generalist clades also showed significant phylogenetic signal (Appendix S1). In fact, no effect of host range was found on the probability of having significant phylogenetic conservatism in ecological interaction (Fig 2, Table S6). Similarly, the occurrence of conservatism in ecological interactions did not depend on the sign of the interaction (Table S6), since 69% of antagonistic systems and 59% of mutualistic systems had significant phylogenetic signals. That is, parasites and predators have the same probability of having phylogenetically-conserved ecological interactions as pollinators, seed dispersers or mycorrhizae.Phylogenetic conservatism in antagonistic interactions is expected under most macro-coevolutionary models, such asescape and radiation model, parallel cladogenesis,sequential evolution or diversifying coevolution2,7. In fact, phylogenetic conservatism has been frequently found in parasites andherbivorous insects7.

There was a slight tendency of symbiotic systems to have more conserved interactions (71% of the symbiotic systems had a significant phylogenetic signal) than non-symbiotic systems (57% had significant phylogenetic signal), although this difference was not statistically significant (Table S6). Ecological interactions were similarly conserved in symbiotic and non-symbiotic clades. Similarly, there was a tendency for phylogenetic conservatism being more frequent in viruses (85%) and prokaryotes (80%) than in eukaryotes (59%), although this difference was also nonsignificant (Table S6). Finally, as previously shown for some other systems5,17, the occurrence of a phylogenetic signal in our data set was significantly related to sample size (number of species studied per clade) (Table S6).In fact, if we remove from our data set those clades with less than 20 species, we find that 87% of specialist clades (N= 32 systems) and 68% of generalist clades (N= 33systems) had significant phylogenetic signals. Most previous studies have stressed the ubiquity of phylogenetic conservatism in host use inspecialist (mostly symbiotic) systems, from RNA viruses toherbivorous insects3-7, 22-24.In agreement with this traditional view, several studies have pointed out that co-cladogenesis and phylogenetic conservatism in ecological interactiondisappears when generalist species are included in the analyses7, 24. Our study indicates, however, that ecological interactions are also conserved in generalist (both symbiotic and non-symbiotic) clades.In other words,evolutionary conservatism in ecological interactions is a recurrent phenomenonacross the entire tree of life (Fig. 3).

We found that clades having significant phylogenetic signal in ecological interactionsalso had higher values of modularity, and this occurred in all kinds of organism (virus, prokaryote and eukaryote) and interaction (Fig 2e-h). These findings indicate that clades showing higher evolutionary conservatism in their ecological interactions also have stronger differentiation in host use among modules. This means that species belonging to the same module share few hosts with species from other modulesin conserved systems, whereas in non-conserved systemsspecies belonging to different modules tend to share some hosts.Thisprobably occurs because the use of a specific host assemblage(adscription to a specific module) requiresparticular adaptations. In clades where modules are conserved, species retain ancestral traits that influence their ecological interactions7,23, constraining the present and future capacityto use alternative hosts from other modules1. In contrast, in non-conserved systems most traits involved in host useare likely to representnewadaptations. In this scenario, the species could possess adaptations for usingalternative and disparate hosts. It is remarkable that this relationship was also found for generalist clades (Fig. 2f), despite the fact that modularity is negatively related to host range. That is, although generalist species usually share some hosts among different modules, among-module differentiation in the composition of their host assemblages is higher in evolutionarily conserved interactions. Conservatism in ecological interactionsis associated withhigh host-use differentiation both in generalist and specialist organisms.

Our study has demonstrated that phylogenetic conservatism in ecological interactions is a general pattern occurring in many taxa belonging to very separate branches of the entire tree of life, from viruses to animals, and in most types of interaction, from specialized symbiotic antagonisms to generalized non-symbiotic mutualisms. These findings have major implications not only for the retrospective investigation of the evolution of interactions but also for the ability to predict the potential formation of future interactions, something crucial in many disparate areas such as Conservation Biology, Forestry, Agricultural and Animal Sciences, Epidemiology and other biological disciplines. The understanding of the dynamics of biological invasions, the success of crops in new areas, the cross-species transmission of pathogens, the emergence of novel epidemics, etc., will benefit from considering the pervasiveness of phylogenetic conservatism in ecological interactions. In conclusion, our study suggests that the same rules seem to drive the evolution of most ecological interactions and are strongly contributing to the organization of biodiversity on Earth.

Methods summary

We constructed bipartite networks (N=116 systems) of species belonging to the same genusand their known hosts. Species were then connected through the co-occurrence of interactions. We subsequently converted the bipartite networks into unipartite networks according to shared interactions. Modularity level and number of modules per network was determined using an algorithm based on simulated annealing14-15. This algorithm identifies modules, groups of species having most of their links within their own module with an accuracy of 90%15. The modules were validated statistically by permutational multivariate analyses of variance using distance matrices (ADONIS), whichtested whether element similarity (i.e. similarity between species as a function of host use and similarity between host taxa as a function of their interacting organisms) was significantly higher within than between modules. Phylogenetic conservatism in host use was determined in each system by estimating the significance of the phylogenetic signalfollowing Maddison & Slatkin25. The character “host use” was the module to which the species was ascribed by the annealing algorithm.We mapped the evolution of host use onto published phylogenetic trees.