Supplementary methods

Heterozygote deficiencies in parasite component populations:

An evaluation of interrelated hypotheses in the raccoon tick, Ixodes texanus

G Dharmarajan, JC Beasley and OE Rhodes Jr.

Department of Forestry and Natural Resources,

Purdue University,

West Lafayette, Indiana, USA

Large allele dropout

The most commonly used method for detection of large allele dropout is that implemented in the software microchecker (Van Oosterhout et al., 2004). However, large allele dropout can also be recognized in the presence of a statistically significant negative slope when regressing FIS of the kth allele in the jth locus in the ith IP (fijk) against allele size (see de Meeûs et al., 2004). Thus, at each locus we estimated the fijk separately for each IP and used a GLM (normally distributed errors and identity link; spss ver. 16) to explore the effect of allele size on fijk while controlling for host level factors. The initial model had the form fijk ~ poly(Allele size, 2) + Host sex + Host age + Host sex:Host age + Constant, where poly(Allele size, 2) was the quadratic function of allele size. Apart from the quadratic function, we also tested for a monotonic relationship between fijkand allele size. As recommended by de Meeûs et al., (2004) we weighted each observation by the product Nijpijk(1-pijk), where Nij is the number of individuals genotyped and pijk is the frequency of the kth allele (at the jth locus in the ith IP). Thus, more weight was given to larger and more polymorphic samples. Model parsimony was assessed using AICC. We used the same procedure as described above to test for signatures of large allele dropout at the CP scale.

Cryptic structure

We primarily relied on the program baps ver. 5.3 (Corander et al., 2008)for detection of cryptic population structure. While, numerous spatial genetic clustering algorithms are currently available to test for cryptic microgeographic structure, Latch et al., (2006) demonstrated that baps ver. 3.1 (Corander and Marttinen, 2006) and structure ver. 2.1 (Pritchard et al., 2000) are capable of accurately partitioning individuals even at low levels of genetic differentiation (FST ≥ 0.05). Both these programs identify cryptic sub-structure by minimizing Hardy-Weinberg and linkage disequilibrium within each of k clusters, the programs use algorithms which exhibit different advantages and disadvantages. While the stochastic-greedy algorithm used by baps is fast, it tends to overestimate k due to the inference of a few small spurious clusters. On the other hand, the Markov Chain Monte-Carlo (MCMC) algorithm used by structure more accurately identifies the number of clusters but is slower to implement (Latch et al., 2006).

Given, the probability that utilizing only baps may lead to the spurious identification of cryptic population structure we confirmed the baps results utilizing the program structure ver. 2.2. Briefly, we first calculated the maximum k identified by baps with a posterior probability ≥ 0.05 (designated k0.05) to facilitate structure analyses (see below). We then used structure ver. 2.2 to calculate the overall likelihood of the genotypic data assuming each CP was composed of 1–k distinct clusters. To this end, we utilized a model that assumed admixture and correlated allele frequencies between clusters. We constrained the maximum k for each structure run to k0.05+2 (the additional values evaluated to facilitate Δk computation; see below). Because baps likely overestimates the true number of clusters (Latch et al., 2006), we felt the loss of information due to the constraint of maximal k in structure would be minimal. We performed 5 iterations per k, first allowing the Markov chain to reach stationarity with a burn-in of 150 000 MCMC simulations, followed by 500 000 MCMC simulations to find optimal clusters. Provided k > 1 (as determined by the loge likelihood of the data given k), we chose the most parsimonious number of clusters using the Δk method proposed by Evanno et al., (2005).

Kin structure

We evaluated the levels of Type I (α) and Type II (β) errors in classification of half-sib groups by three clustering algorithms implemented in structure (Pritchard et al. 2000), baps (Corander et al. 2008) and pedigree (Herbinger 2005). In order to evaluate the error rateswe first generated 20 random populations utilizing the software kingroup (Konovalov et al. 2004). Each population consisted of 7 half-sib groups (group sizes were 5, 4, 3, 3, 2, 2, 2 individuals; 21 individuals in total/population) which were generated utilizing global allele frequencies at the 11 I. texanus microsatellites. We evaluated levels of FIS at each of the 11 loci and obtained approximate 95% confidence intervals by jacknifing across populations utilizing the program fstat (Goudet, 1995; Fig. S2A). Each of the 20 random populations were then analysed seperately utilizing the three clustering algorithms. In the case of structure we calculated the overall likelihood of the genotypic data assuming each random population was composed of k = 1–10 distinct clusters; utilizing a model that assumed admixture and correlated allele frequencies between clusters. In the case of baps for each random populationwe carried out 5 runs wherein the maximum k was constrained to 10, and the k having the highest posterior probability (across the 5 runs) was considered the most likely number clusters identified by baps. Finally, in the case of pedigree for each random population we carried out 5 runs utilizing 500 000 iterations/run and the the following run parameters: Full sib constraint = 0; Temperature = 10; Weight = 1 (see Table S1 for run parameter details). Once the best partition was generated by each of the three algorithms for each of the 20 random populations the level of α error was calculated as the proportion of unrelated individual pairs incorrectly grouped together in a single cluster (i.e. the number of unrelated individuals wrongly classified as being half-sibs). Alternatively, the power (1- β) was calculated as the proportion of true half-sib pairs correctly grouped together in a single cluster (i.e. the number of half-sibs correctly classified as being related). The levels of α error and power for the clustering algorithms implemented in structure , baps and pedigree are given in Fig S2B.

Testing assumptions of MC simulation

The post hoc Monte-Carlo (MC) simulation was performed to evaluate whether the levels of kin-structure and life-history characteristics of I. texanus could adequately explain deviations from HWE at the IP scale. The MC simulation approach, principally based on the subdivided breeding group model proposed by Criscione and Blouin (2005) and modified to take into account the levels of kin-structure observed in I. texanus, was based on three major assumptions. First, we assumed that there was pangamy at the scale of the raccoon den, an assumption that seemed justified in the absence of empirical data indicative of (positive/negative) assortative mating. Second, we assumed that mating took place prior to dispersal in I. texanus. This assumption also seems reasonable since mating takes place prior to blood-feeding in the case of nidicolous ticks (Sonnenshine, 1993) and dispersal can only take place while ticks are feeding on the host. Finally, we assume that IP scale allele frequencies are an adequate estimate of allele frequencies of ticks that will mate to produce subsequent generation of ticks. While, we feel these assumptions are realistic based on I. texanus biology there was no direct way of evaluating if these assumptions were accurate. However, it was clear that these assumptions could only affect the results of the MC simulation through genetic patterns of the kin groups generated. Thus, to test if this assumption was likely to affect the results of our MC simulation for each observed kin group (with > 1 tick) we generated 100 kin groups of the same size (following the MC simulation approach outlined in the main text). Within the observed and simulated kin groups we calculated the average pair-wise relatedness (Queller and Goodnight 1989) and FIS as implemented in spagedi (Hardy and Vekemans 2002) and genepop’007 (Rousset 2007), respectively. We created a frequency distribution of observed average relatedness and FIS values and compared this distribution with the distribution of simulated values. The above test was only carried out in the five CPs that showed significant levels of kin structure (see main text) because in CPs with non-significant levels of kin structure the MC simulation assumed all ticks were unrelated (i.e. kin group size = 1; number of kin groups = number of ticks sampled). The results of comparing the frequency distribution of average relatedness and FIS values in observed and MC simulated kin groups showed a strong concordance between the two at all loci and CPs examined (see Fig. S5 and S6, respectively).

Supplementary references

Corander J, Waldmann P, Marttinen P, Sillanpaa MJ (2004). BAPS 2: enhanced possibilities for the analysis of genetic population structure. Bioinformatics20(15): 2363-2369.

Corander J, Marttinen P, Sirén J, Tang J. (2008). Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinformatics, 9: 539.

Criscione CD, Blouin MS (2005). Effective sizes of macroparasite populations: a conceptual model. Trends Parasitol21(5): 212-217.

de Meeûs T, Humair PF, Grunau C, Delaye C, Renaud F (2004). Non-Mendelian transmission of alleles at microsatellite loci: an example in Ixodes ricinus, the vector of Lyme disease. Int J Parasitol34(8): 943-950.

Evanno G, Regnaut S, Goudet J (2005). Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol14(8): 2611-2620.

Goudet J (1995). FSTAT (Version 1.2): A computer program to calculate F-statistics. J Hered86(6): 485-486.

Hardy OJ, Vekemans X (2002). SPAGEDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Mol Ecol Notes2(4): 618-620.

Herbinger CM. (2005). PEDIGREE. Available from

Konovalov DA, Manning C, Henshaw MT (2004). KINGROUP: a program for pedigree relationship reconstruction and kin group assignments using genetic markers. Mol Ecol Notes4(4): 779-782.

Latch EK, Dharmarajan G, Glaubitz JC, Rhodes OE (2006). Relative performance of Bayesian clustering software for inferring population substructure and individual assignment at low levels of population differentiation. Conserv Genet7(2): 295-302.

Pritchard JK, Stephens M, Donnelly P (2000). Inference of population structure using multilocus genotype data. Genetics155(2): 945-959.

Queller DC, Goodnight KF (1989). Estimating relatedness using genetic markers. Evolution43(2): 258-275.

Rousset F (2008). GENEPOP ' 007: a complete re-implementation of the GENEPOP software for Windows and Linux. Mol Ecol Res8(1): 103-106.

Sonenshine DE (1993). Biology of ticks. Vol. 2. Oxford University Press: New York.

Van Oosterhout C, Hutchinson WF, Wills DPM, Shipley P (2004). MICRO-CHECKER: software for identifying and correcting genotyping errors in microsatellite data. Mol Ecol Notes4(3): 535-538.

Dharmarajan et al.; Supplementary Methods1