Electronic Supplementary Material
Cultural Macroevolution on Neighbor Graphs: Vertical and Horizontal Transmission among Western North American Indian Societies
Mary C. Towner, Mark N. Grote, Jay Venti, Monique Borgerhoff Mulder
Gibbs Sampler
For a given trait, the parameters θm, λm, βm; m=1,…4 (known as “driving values”) are chosen to be in a region of high likelihood support under model m. We accomplished this by carrying out preliminary runs of the Gibbs sampler, using the MCMC maximum-likelihood method described by Geyer (1991, 1996) to approximate the likelihood surface for model m=1,…4. We then chose θm, λm, βm close to the approximate maximum-likelihood estimates for model m. Although the realizations x1,… xR and mixture distribution described below can be generated by any family of distributions having positive support on the entire set of binary arrays, we chose to concentrate our simulation efforts on the focal models 1–4. For each trait and model, the initial state of the Gibbs sampler has all observations equal to the same trait value, as suggested by Geyer (1991). We also implement stochastic symmetry swaps, under which (at randomly chosen scans) the positive and negative signs of trait values are reversed, to enhance mixing of the sampler.
We use an importance sampling technique (Geyer 1994, 1996) which combines information from realizations at parameter values θm, λm, βm; m=1,…4, to approximate likelihoods for models 1-4 for each trait. We sample from models 1-4 in equal proportions; thus the importance sampling distribution (up to a constant of proportionality) is
hmix(x) = Σm hm(x) eηm (ESM 1)
where hm(x) = exp{ θm S(x) + λm T(x) + βm U(x)} and eηm = 1 / 4 z(θm, λm, βm) (see Geyer 1996:253-254). We use the “reverse logistic regression” method of Geyer (1994) to estimate ηm; m=1,…4. Finally, the log-likelihood ratio for the observation x at parameter values θ, λ, β is approximated as
log (L[θ, λ, β; x] / Lmix) ≈ log (h(x) / ĥmix(x)) – log (R−1 Σr [h(xr) / ĥmix(xr)]) (ESM 2)
where Lmix is the likelihood under the importance sampling distribution (eq. ESM 1), h(x) = exp{ θ S(x) + λ T(x) + β U(x)}, ĥmix(x) is obtained by substituting the reverse-logistic estimates of ηm; m=1,…4 in equation (ESM 1), R is the number of realizations from the Gibbs sampler and x1,… xR are the realizations themselves. We evaluate the right-hand side of equation (ESM 2) on a three-dimensional grid of parameter values (θ, λ, β), and fit models 1-4 to the observation x by maximizing equation (ESM 2) on the grid (or on an appropriate lower-dimensional subset, such as the two-dimensional grid having θ=0 for model 2).
Model Comparison
Likelihoods under models 1–4 are maximized with respect to an importance sampling distribution shared in common, via equation (ESM 2); this facilitates AIC (and, respectively, BIC) comparisons among models as follows. Maximizing the right-hand side of equation (ESM 2) over the parameter set for model m produces the stochastic approximation log LR*m ≈ log(L*m / Lmix). For an alternative model m′, the difference in AIC is approximated as
−2(log LR*m − Km − log LR*m′ + K m′) ≈ −2(log[L*m / Lmix] − Km – log[L*m′ / Lmix] + K m′)
= −2(log L*m − Km) + 2(log L* m′ − K m′)
= AIC m - AIC m′ (ESM 3)
A similar calculation produces the approximate difference in BIC. AIC (and, respectively, BIC) values for models 1–4 can be ordered from smallest to largest by examining the differences approximated by equation (ESM 3).
Exact Calculation
Evaluating the autologistic likelihood (eq. 1 of the main text) exactly involves the enumeration of 2n binary arrays, which requires long computing times even for small samples; therefore we chose a subsample from our original 172 societies for exact model fitting. We reasoned that the subsample should contain relatively few language groups and should be geographically limited, as compared with the original sample. We focused on the Northwest Coast Group identified by Jorgenson (1980), eliminating societies from this group that were language isolates, were geographically distant from the bulk of the group, or had missing values in a subset of traits under consideration; this produced a subsample of n=24. The three chosen traits—digstick, brideservice, and brideprice—have levels of variation in the subsample similar to those typical of traits in the original sample. We defined linguistic and spatial neighbors in the subsample in the same way as for the original sample (see “Neighbor Graphs” in the main text). The average number of linguistic neighbors in the subsample is 7.1 (range 2–11), and the average number of spatial neighbors is 4.3 (range 1–8). Our intention here is not to make inferences about cultural evolution in the subsample, but to investigate the accuracy of results obtained by the MCMC method by comparing them with exact results.
We enumerated the 224 binary arrays that form the summands of z(θ, λ, β) using the binary representations of the integers 0, 1, …, 224 – 1. A programming loop through these integers produces, in turn, each possible binary array of length n=24. Additional programming steps embedded in the loop calculate the contributions to z(θ, λ, β) from each array, on a grid of (θ, λ, β) values. Direct maximization of the likelihood L(θ, λ, β; x), as well as calculations leading to AIC and BIC weights, are straightforward once z(θ, λ, β) has been evaluated. We obtained maximum-likelihood parameter estimates and AIC weights for digstick, brideservice, and brideprice in the subsample using the exact scheme and then obtained analogous results in independent runs of the MCMC method, using the implementation details described in the main text and above. ESM Table 1 shows that the MCMC method produces estimates and model weights very close to the exact values. The computing time for a grid of 70,000 parameter points was approximately 5.5 days using the exact method (on a Dell Precision Workstation 650), as compared with a few hours for the MCMC method (on a Dell laptop). For a given grid size, each addition of a society to the sample doubles the computing time when the exact method is used.
Simulation
We designed a simulation study in which models 1–4 were fitted to datasets generated under controlled levels of horizontal transmission. Charles Nunn graciously modified the simulation program described in Nunn et al. (2006) to produce a single binary trait for each society and generated 200 simulated datasets for our use. The program is based on an explicit spatio-temporal evolutionary model unrelated to the autologistic model. Each society in a simulated dataset has a binary trait value, a position on a square lattice, and a known phylogenetic lineage. The parameters held constant over all simulations were number of societies (100), number of discrete generations (800), per-generation probability of extinction (0.1), per-generation probability of diversification (propagation of a society along with its trait to an adjacent, unoccupied position on the lattice, 0.9), and per-generation probability of trait evolution (a random switch to the other trait value, 0.01). The per-generation probability of horizontal trait transmission (donation of a society’s trait value to an adjacent society already in existence) was systematically varied over the simulations, with 50 simulations at each of the values 0.0, 0.001, 0.01, 0.1.
To turn a simulated dataset into a neighbor-graph dataset, we converted the known phylogeny into a phylogenetic neighbor graph (resulting in a coarsening of information about historical relationships between societies). In the phylogenetic neighbor graph, tips of the tree are collected into mutually exclusive sets of closely related societies, such that the average number of phylogenetic neighbors across the sample is approximately the same as the average number of spatial neighbors (3.6, derived from the geometry of the 10-by-10 lattice). This calibration procedure is analogous to the one used for the WNAI sample, except that here the spatial neighbor graph of the square lattice is fixed, and the phylogenetic neighbor graph is calibrated to it.
To achieve the calibration, for each simulated dataset we progressively moved a phylogeny horizon from the tips of the unrooted tree inward to the center (see ESM Figure 2). As the horizon crosses each internal node, cultures are segregated into clades which branch at a distance from the center greater than or equal to the distance from the center to the horizon. The phylogenetic neighbor graph treats each member of a clade as equally related to all other members (thus the clades are converted to cliques). As the horizon moves inward, the number of cliques decreases; at the center there would be only one clique. At some internal node, the average neighbor number of the resulting graph most closely approximates 3.6. This is the graph we chose for analysis of the simulated datset.
We developed a semi-automated batch processing program to fit autologistic models to the simulated datasets using the MCMC method described in the main text and above. Some numerical compromises were necessary to keep overall computing times reasonable: for each simulated dataset the approximate log-likelihood ratio (eq. ESM 2) is based on R=42,000 realizations, with thinning and burn-in as for the WNAI analysis.
Results of the simulation study are summarized in ESM Figures 3 and 4. ESM Figure 3 is a scatter plot of approximate maximum-likelihood estimates of the spatial (θ; horizontal axis) and phylogenetic (λ; vertical axis) association parameters from model 4, for each simulated dataset. Plotting characters are shaded according to the level of horizontal transmission in effect. Simulations with lower horizontal transmission rates tend to have larger estimates of λ and smaller estimates of θ. The converse is true for simulations with higher horizontal transmission rates. Calibration of the spatial and phylogenetic neighbor graphs produces θ and λ estimates roughly on the same scale. The bar graphs of ESM Figure 4 depict model weights averaged over 50 simulated datasets, for each level of horizontal transmission (ht). From top to bottom, it is evident that as levels of horizontal transmission increase, average support for model 3 (spatial neighbors only) increases while support for model 2 (phylogenetic neighbors only) decreases. We understand the relatively high support for model 4 (both spatial and phylogenetic neighbors) in the ht=0 simulations to be a consequence of an assumption built into the model of Nunn et al. (2006): parent societies propagate only into adjacent, unoccupied positions of the lattice. Thus spatial association carries information about trait similarity in the simulation model even when there is no horizontal transmission.
ESM TABLE 1. Comparison of parameter estimates and model weights obtained using the exact and MCMC methods for the subsample of n=24.
θ, λ, and β are maximum likelihood estimates under model 4, including both the spatial and linguistic neighbor graphs. M4, M3, and M2 are AIC weights for the respective models (the AIC weight for model 1 can be obtained by subtraction: see “Model Comparison” in the main text).
Trait / Method / θ / λ / β / M4 / M3 / M2digstick / exact / 0.26 / −0.02 / −0.07 / 0.22 / 0.58 / 0.12
MCMC / 0.26 / −0.02 / −0.08 / 0.22 / 0.58 / 0.12
brideservice / exact / 0.38 / −0.04 / −0.08 / 0.28 / 0.68 / 0.03
MCMC / 0.38 / −0.04 / −0.08 / 0.28 / 0.68 / 0.03
brideprice / exact / 0.31 / −0.11 / 0.05 / 0.29 / 0.43 / 0.08
MCMC / 0.31 / −0.11 / 0.05 / 0.29 / 0.43 / 0.08
Towner et al. ESM-15
ESM TABLE 2. Forty-four cultural traits within six domains.
WNAI traits were selected on these criteria: breadth of trait type across the domains, few missing cases, and low skew (such that the trait exhibited enough variation for statistical analyses to be meaningful). All WNAI traits are already coded categorically; we combined categories as appropriate in order to create meaningful binary traits. For example, our binary variable agriculture is based on V. 187 in the WNAI, which has seven outcomes: absent (n=81) and six other categories describing the nature (food, nonfood) and extent (incipient, % of diet) of horticulture and agriculture. We collapsed the latter into one category, for which the answer to the question “Is there agricultural or horticultural production (including nonfoods)?” would be “yes.”
Towner et al. ESM-15
Description (modified from Jorgensen 1980) / Binary Variable / yes (1) / no (−1) / n / WNAIDomain: Subsistence and Settlement
Is there agricultural or horticultural production (including nonfoods)? / agriculture / 90 / 81 / 171 / V187
At least 1-10% of diet contributed by local agriculture? / agrodiet / 37 / 135 / 172 / V193
At least 26-50% of diet contributed by aquatic animals? / aquaticdiet / 79 / 93 / 172 / V199
At least 26-50% of diet contributed by non-aquatic hunting? / huntdiet / 92 / 80 / 172 / V204
At least 26-50% of diet contributed by local gathering? / gatherdiet / 115 / 57 / 172 / V211
At least semisedentary settlements occupied throughout the year? / fixedsettle / 90 / 81 / 171 / V284
At least 1-5 persons per square mile? / popdens / 53 / 118 / 171 / V288
Domain: Marriage and Residence
At least 11-25% incidence of polygyny? / polygyny / 54 / 118 / 172 / V294
Is there a marked tendency toward exogamous marriages? / exogamy / 69 / 99 / 168 / V301
Are there unequal gift exchanges, which tend to approach bride-price, at marriage? / brideprice / 48 / 122 / 170 / V302
Is there continued exchange of goods and services between relatives of the bride and groom after marriage? / affinalexch / 56 / 107 / 163 / V303