Appendix 1Inferring Invasion Scenarios Using Approximate Bayesian Computation

Appendix 1Inferring invasion scenarios using approximate Bayesian computation

Approximate Bayesian Computation (ABC), as implemented in the DIYABC program v.1.0.4.46 (Cornuet et al.2008), was used to explore putative scenarios of invasion followed by the yellow-legged hornet in France and Korea. We defined and compared 15 competing scenarios for each invasive population (Fig. S1, Table S2, Appendix 2). These scenarios differ in the source population from which the invasive population originated. DIYABC software made it possible to test single populations (Fig.S1a and S1b) or admixtures between populations (Fig. S1c) as possible sources of invasions. Furthermore, a putative source could be either sampled (Fig. S1a) or unsampled (Fig. S1b). Briefly, 106 datasets were simulated according to each competing scenario, with parameters drawn from prior distributions. We chose prior parameter distributions according to historical and biological knowledge on V. velutina (Appendix 4). As proposed by Bertorelle et al. (2010), we also made several successive runs of the simulation to choose prior distributions for divergence time and effective population size that were broadly compatible with the observed pairwise Fst and allelic richness. The relative order of divergence of the four native sampled populations follows these pairwise FST. Observed and simulated datasets were then summarized using the summary statistics classically used in ABC (Cornuet et al.2008; Guillemaud et al. 2010; Lombaert et al. 2010) (Appendix 4). Normalized Euclidian distances between observed and simulated sets of summary statistics were then computed, and the 10 000 simulated data with the smallest distances was used to estimate the posterior probabilities of the competing scenarios by polychotomic logistic regression (Beaumont et al. 2002; Cornuet et al. 2008).

To compute type I and II errors, we simulated 100 pseudo-observed datasets (PODS) under each scenario. These PODS, which correspond to simulated data for which the true scenario is known, were treated by ABC as though they were observed data. In this case, for tractability, we used only 100,000 datasets for each scenario during the ABC procedure, instead of the 106 datasets mentioned above.

In cases where the same scenario was selected for both France and Korea, we performed an additional analysis to establish whether France and Korea could be one another’s sources. This analysis resulted in the comparison of four different scenarios (Appendix 2).

Appendix 2. Description of the competing invasion scenarios considered in the ABC analysis

In the DIYABC analysis, gene genealogies were built according to rules from coalescent theory and based upon the population tree. Divergence order between the four sampled populations was based upon the observed FST. For each invasive population (Pinv, France or Korea), we compared 15 scenarios split into three different classes:

Sampled origin scenario (SOS): we considered that the invasive population (Pinv) derived from one of the four sampled source populations: Zhejiang and Jiangsu provinces (ZHE/JIA); Yunnan province (YUN); Vietnam (VIE); or Indonesia (IND). The effective population sizes of the four source populations are modeled independently. Because these samples may include only a subset of the genetic variability of the native area, we considered the possibility that the real source of invasive population had not actually been sampled. Consequently and following the procedure described by Lombaert et al. (2011), we considered that the true source population could be an unsampled phantom population (named Pf(1)) when related to ZHE/JIA,which diverged from the sampled native populationtf generations before sampling. Pinv was founded ti generations before sampling and Nb was the effective number of founders. After db generations of bottleneck, Pinv reached the stable effective size, Ninv.

Example of the four possible SOS scenarios from this class is shown in Figure S1a from the case of ZHE/JIA origin.

Unsampled origin scenario (UOS): we also considered the possibility that the true source of the invasive population was an unsampled population Pg genetically distant from the four sampled native populations. We thus considered the possibility that the population, Pg, diverged directly from an ancestral population tg generations ago (Fig. S1b). This resulted in one possible scenario for each invasive population.
Admixture origin scenario (AOS): we supposed that the invasive population (Pinv) was derived from an admixture of two of the five source populations described above (ZHE/JIA, YUN, VIE, IND and Pg) (Fig. S1c). This resulted in ten possible scenarios for each invasive population.

In addition, we compared scenarios in which Korea and France i) could have originated independently from the same unsampled source that was derived from sampled population X, ii) could originate from two unsampled populations that were derived from the same sampled population X, or iii) could have derived from each other (France from Korea or Korea from France) and iv) could ultimately have derived from sampled population X. For all tested scenarios we assumed that, after divergence, populations did not exchange migrants.

Appendix 3. Prior setting for parameter distributions in the ABC analysis

Effective population sizes of sampled and unsampled sources and of invasive populations (Ninv) were drawn from a log-uniform distribution bounded by 200 and 20000 (LogUnif[200-20000]) diploid individuals. The divergence time tf between the hypothetical source populations and the sampled native populations was drawn from a uniform distribution between 30 and 500 generations. Historical data suggest that introductions into France and Korea occurred before 2005. Because of the difficulty of detecting introduced insects at low density and uncertainty about the real introduction date ti, this was drawn between 10 and 20 generations (Unif[10-20]).After their introduction, the size of each invasive population Nb was drawn from LogUnif[2-1000], with Nb < Effective size of source population, during tb generations (Unif[1-5]). Parameter tb was bounded between 1 and 5 generations because historical data suggest that introduction in France and Korea occurred before 2005, and sampling was conducted between 2006 and 2010. Finally, all five native populations (including the unsampled one) diverged from an ancestral native population tj (j = 1…4) tg generations ago (Unif[100-500]). We considered a generation time of one year because hornet species have annual life cycles (Matsuura & Yamane, 1990).

Regarding parameters for microsatellite markers, each locus was assumed to follow a generalized stepwise mutation model (GSM) with a possible range of 40 contiguous allelic states for all loci, except for locus R1-180 for which the allele range was 80. The mean mutation rate (μ) was drawn from a uniform distribution bounded between 10-4 and 10-3 and the mutation rate for each locus was drawn independently from a Gamma distribution (mean = μand shape = 2). The mean parameter of the geometric distribution (p) of the length in repeat number of mutation events was drawn from a uniform distribution bounded between 0.1 and 0.3. The mean single nucleotide insertion/deletion mutation rate (µ_SNI) was drawn from a uniform distribution bounded by 10-8 and 10-4 and the individual locus rate was drawn from a Gamma distribution (mean = µ_SNI and shape = 2).

Appendix 4. Summary statistics used in the ABC analysis

For each population, we used the mean number of alleles per locus, mean expected heterozygosity (Nei, 1987), mean allelic size variance and mean ratio of the number of alleles over the range of allele sizes (Garza & Williamson, 2001). For each pair of samples we used the pairwise Fst values (Weir & Cockerham, 1984). For each pair of samples, including the invasive population under analysis, we used the shared allele distance (SAD) (Chakraborty & Jin, 1993), the Goldstein distance (²) (Goldstein et al. 1995), the mean individual assignment likelihoods of population i being assigned to population j, and the maximum likelihood estimate of admixture proportion (Pascual et al. 2007).

Appendix 5 Inference of invasion scenarios

In the first ABC analysis, we considered that each French and Korean invasive population could originate either from an unsampled subset derived from one of the four native sampled populations (Zhejiang/Jiangsu, Yunnan, Vietnam or Indonesia), from an unsampled population directly derived from the ancestral native population, or from admixture events between two source populations. Both French and Korean invasive populations probably originated from an unsampled subset derived from the Zhejiang/Jiangsu population, with a posterior probability of 0.66 (95% CI = [0.57-0.76]) for France and 0.90 (95% CI = [0.85-0.94]) for Korea. Confidence in scenario choice was also evaluated by estimating type I and type II errors on the basis of 100 PODs per scenario. For both Korea and France and scenario 1, type I errors (the probability that scenario 1 is not chosen when it is true) were large (0.64 and 0.59, respectively) but type II errors (the probability that scenario 1 is chosen when it is not true) were low (0.02 for both invasive populations) suggesting strong support for the selected scenario (type II errors are given in Table S5). Using the Bayes theorem and following the approach of Fagundes et al. (2007), we also computed the probability that the selected scenario was true given the posterior probability computed with ABC (in the case of France, the posterior probability is 0.66 and consequently this probability is written P(source=Zhejiang/Jiangsu|P(source=Zhejiang/Jiangsu)=0.66)). The computation of this probability takes into account the computed posterior probability, type II errors and the shape of the densities , where i is one of the 15 possible sources (corresponding to the 15 scenarios) obtained by simulations. In the case of France, and because the priors of the scenarios were the same, this probability was computed as. These probabilities calculated for each scenario according to Fagundes et al. (2007) are given in Table S5. The values obtained confirmed that scenario 1 is the best for explaining the invasions of both France and Korea.

Because the same scenario was selected for both invasive populations, we conducted an additional analysis to further describe their origin. We found that France and Korea were not derived from each other but instead originated from two unsampled subsets that were independently derived from Zhejiang/Jiangsu (data not shown).

Fig. S1a) Statistical parsimony network obtained from TCS (Clement et al. 2000), based on a 95% connection limit. Circles are sized according to total observed frequency of the haplotype. A reference scale is provided in b). Letters in circles denote the name of each of the eleven haplotypes found in this study. Lines indicate a single mutational step between two haplotypes. Small, unlabelled black circles representhypothesized unsampled haplotypes. The populations are distinguished by colors. Right: Haplotypes from France, Zhejiang/Jiangsu, Yunnan and one haplotype from Vietnam embedded within the same haplotype network. Left: Haplotypes from Indonesia and one haplotype from Vietnam were diverse and segregate into two networks that could not be connected with 95% conﬁdence in TCS.

Fig. S2Graphical output from STRUCTURE (Pritchard etal. 2000) for each value of K from 2 to 10 (modified in DISTRUCT (Rosenberg 2002)). Each vertical line represents an individual, and the color composition displays the probability of belonging to each of the 2-10 clusters defined by STRUCTURE. Analysis was performed with only the 15 markers developed from V. velutina. This analysis confirms the results obtained with 22 markers (see fig. 4).

Fig. S3: Graphic representation of three of the fifteen competing invasion scenarios considered in the ABC analysis (abbreviations are given in table 1). Divergence order of the four sampled native populations (ZHE/JIA, YUN, VIE and IND) is the same in all scenarios and based on FST data. Time 0 is the sampling date. Divergence times of native populations from an ancestral unsampled population are noted t1, t2, t3, t4 and tg. The invasive population Pinv was founded ti generations ago. After db generations (bottleneck duration), Pinv reached a larger stable effective population size tb generations ago. Times in generation and effective size of each population are simulated from prior distribution. Circles represent all populations considered in the analysis. Dashed circles indicate that the population was not sampled. Time is expressed in number of generations and is not represented to scale.

a) Sampled origin scenario (SOS)(see supporting information, appendix 1): one of four possible SOS scenarios where a phantom population (Pf(1)), issued from one of the sampled populations (ZHE/JIA) tf generations ago, was the source population for the invasive population (Pinv). b) Unsampled origin scenario (UOS) where the source of the invasive population (Pinv) was an unsampled population (Pg) genetically distant from the four sampled native populations. c) Admixture origin scenario (AOS): one of ten hybrid scenariosin which the invasive population (Pinv) was derived from admixture from a subset (Pf(1) and Pf(2)) of ZHE/JIA and YUN source populations, respectively. When admixture occurs, the admixture rates ra and 1-ra are the genetic contribution of each native population (simulated from prior distribution).

a) Sampled origin scenario (SOS)

b) Unsampled origin scenario (UOS)

c) Admixture origin scenario (AOS)

Fig. S4 Posterior distribution of main demographic parameters

Table S4. List of acronyms and abbreviations for DIYABC analysis.

Invasion scenarios / Population names
Pa / Ancestral population
ZHE/JIA / Zhejiang/Jiangsu population
YUN / Yunnan population
VIE / Vietnam population
IND / Indonesia population
Pg / Unsampled population from native range
Pf(1) / Unsampled subset of ZHE/JIA population
Pf(2) / Unsampled subset of YUN population
Pinv / Invasive population (France, FRA or Korea, KOR)
Population sizes
Nb / Effective number of founders at the origin of the invasive population
Ninv / Effective size of the invasive population after stabilization
Times
t1 / Split time between ancestral population and ZHE/JIA population
t2 / Split time between ancestral population and YUN
t3 / Split time between ancestral population and VIE
t4 / Split time between ancestral population and IND
tg / Split time between ancestral population and Pg
tf / Split time between source population (ZHE/JIA, YUN, VIE or IND) and unsampled subset
ti / Introduction date of invasive populations
tb / End of bottleneck
db / Bottleneck duration
0 / Sampling time
Admixture rates
ra / Admixture contribution of first population for Admixture origin scenarios
Sexual admixture model / Population names
Ps / Source population (ZHE/JIA)
Pinv / Invasive population (FRA)
Pfm / Population of introduced females
Pm / Population of mating males
Population sizes
Ns / ZHE/JIA effective population size
Ninv / FRA effective population size after stabilisation
Nfm / Effective number of introduced founderesses
Nm / Effective number of males that mated with introduced founderesses
Times
0 / Sampling time
t1 / Time of introduction to France

Table S5. Posterior probabilities of competing scenarios of invasion and components of the type II error of the selected scenario in the first ABC analysis. The compared scenarios are detailed in Fig. 2. Type II error components correspond to the proportions of cases in which the selected scenario is chosen when it is not the true one. The probabilities and confidence intervals (CI) shown in bold are those of the selected scenario.

France / Korea
Scenario considered / Source / Posterior probability / Probability correct scenario / Type II error / Posterior probability / Probability correct scenario / Type II error
(logistic regression method) / (Fagundes et al. (2007) approach) / (logistic regression method) / (Fagundes et al. (2007) approach)
Scenario 1 / Zhejiang/Jiangsu (ZHE/JIA) / 0.6624 [0.5682, 0.7565] / 0.5912 / 0.02 / 0.8954 [0.8518, 0.9390] / 1 / 0.02
Scenario 2 / Yunnan (YUN) / 0.0022 [0.0007, 0.0037] / 0.0013 / 0 / 0.0001 [0.0000, 0.0002] / 0.0020 / 0
Scenario 3 / Vietnam (VIE) / 0.0025 [0.0000, 0.0057] / 0.0017 / 0 / 0.0000 [0.0000, 0.0000] / 0.0033 / 0
Scenario 4 / Indonesia (IND) / 0.0000 [0.0000, 0.0000] / 0.0007 / 0 / 0.0000 [0.0000, 0.0000] / 0.0025 / 0
Scenario 5 / Unsampled (Pg) / 0.0118 [0.0050, 0.0187] / 0.0113 / 0.01 / 0.0011 [0.0003, 0.0020] / 0.0052 / 0
Scenario 6 / ZHE/JIA+YUN / 0.0799 [0.0432, 0.1166] / 0.0700 / 0.03 / 0.0588 [0.0270, 0.0906] / 0.0488 / 0.06
Scenario 7 / ZHE/JIA+ VIE / 0.0967 [0.0427, 0.1506] / 0.0550 / 0.05 / 0.0068 [0.0019, 0.0117] / 0.0077 / 0.05
Scenario 8 / ZHE/JIA+ IND / 0.0030 [0.0009, 0.0051] / 0.0043 / 0.05 / 0.0186 [0.0064, 0.0309] / 0.0141 / 0.05
Scenario 9 / ZHE/JIA+Pg / 0.1233 [0.0734, 0.1733] / 0.1115 / 0.11 / 0.0188 [0.0091, 0.0285] / 0.0139 / 0.11
Scenario 10 / YUN + VIE / 0.0017 [0.0003, 0.0031] / 0.0025 / 0.01 / 0.0000 [0.0000, 0.0000] / 0.0053 / 0
Scenario 11 / YUN + IND / 0.0000 [0.0000, 0.0001] / 0.0018 / 0 / 0.0000 [0.0000, 0.0000] / 0.0047 / 0
Scenario 12 / YUN +Pg / 0.0055 [0.0023 0.0087] / 0.0069 / 0 / 0.0002 [0.0001, 0.0004] / 0.0072 / 0.01
Scenario 13 / VIE + IND / 0.0000 [0.0000, 0.0001] / 0.0031 / 0 / 0.0000 [0.0000, 0.0000] / 0.0063 / 0.02
Scenario 14 / VIE +Pg / 0.0106 [0.0028, 0.0184] / 0.0080 / 0.01 / 0.0000 [0.0000, 0.0000] / 0.0048 / 0
Scenario 15 / IND +Pg / 0.0003 [0.0000, 0.0006] / 0.0047 / 0.01 / 0.0001 [0.0000, 0.0002] / 0.0095 / 0