Fay et al. Cumulative Cultural Evolution of a Language Game

Supplementary Materials 2: Procedure used to Identify Lexical Term Polarity

The determination of which lexical terms had a positive impact on route reproduction accuracy required three steps. First, the route reproduction accuracy score was non-linearly scaled to fit a Normal distribution . Second, we assessed the performance of any lexical term to be the Bayes’ Factor (lBF) assessing how much better the distribution (predicting higher route reproduction accuracy) predicted the performance of pairs using the lexical term than did the distribution . Any lexical term with lBF > 0.2 was regarded as a positive term, as pairs using it achieved better-than-average route reproduction accuracy. In the same manner, lBF of over was used to identify negative lexical terms. Bayes’ Factor incorporates both the number of cases where a lexical term co-occurs with a positive or negative impact and the size of its impact in a single measure. The low threshold for identifying positive or negative lexical terms ensured that even forms plausibly - but not certainly - making even a small contribution to route reproduction accuracy were included in their respective classes.

The scores for the MapTask istransformed non-linearly so that the distribution of the transformed score Q was unit-normal centred on 0. This is achieved by mapping each score to its quantile. The inverse of the unit-normal cumulative distribution functionthen maps the quantile to Q.

Which lexical terms contribute positively (or negatively) to the route reproduction accuracy scores? One option is to look at the average route accuracy scoreforinstructions which used the lexical term minus those that did not. The greater this difference, the more advantage is provided by the term. The difficulty with this approach is that if a lexical term occurs (by chance) once in the whole data set, but it happens to be in a data-set that does well, then it will be over-rated as an aid to scoring well. To avoid this over-fitting problem, we use Bayes’ Theorem. We consider two hypotheses A(t) and B about the likely route reproduction accuracy scores resulting from the presence or absence of term t. A(t) says that the probability of an Instruction-Giver and -Followpair achieving a particular Q score q is N(*|,1.0) if they use the lexical term t, and N(*|0,1.0) if not. In contrast, hypothesis B says that regardless of use of lexical term t, the probability of scoring is N(*|0,0.0).

We then use Bayes’ Factor to determine the relative likelihood of A(t) and B given the observed data – in which the total number of Instruction-Giver and -Follower pairs was N and the number of pairs using lexical term t was N(t). We assume the two hypotheses are equally likely a priori, and use for the set of pairs that used the lexical term , and for the score achieved by pair .

The value BFt expresses how much more likely it is that Instruction-Giver and -Follower pairs using the lexical term scored better (by a mean shift of ) than the default hypothesis; that use of the lexical term made no difference. So long as the shift is not substantially larger than the mean expected difference the lexical term is likely to make, then will be a good measure of the extent to which we can be confident that lexical term is making a positive impact on the route reproduction accuracy score.

The SM2 Figure shows the actual relative Q-frequency of scores attained by Instruction-Givers which used the lexical termfollow. Examples include, “…follow the contour of the lake till you reach the finish” and “follow the slope of the left side of the palm tree until you get to the top”. The two normal distributions and B are shown in green and black respectively. Clearly the distribution of follow matches more than . When is chosen as 0.10, the of follow is 65. Terms where the is greater than are identified as showing some evidence (albeit sometimes weak evidence) of a positive effect. The same method is applied - but with negative - to find lexical terms with a deleterious effect on route reproduction accuracy.

SM2 Figure. Relative Q-frequency of scores attained by pairs which used the lexical term follow. The two normal distributions and B are shown in green and black respectively.

Supplementary Materials 3: Route Reproduction Accuracy

The route reproduction accuracy scores (logit transformed) were entered into a linear mixed effects model, with a maximal random effects structure as specified below:

lmer (LogitPerform ~ Condition*Generation + (1 + Generation | Chain) +
(1 + Condition*Generation | Map), data= Perform, REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015):

Route Reproduction Accuracy (Logit)
B / CI / p
Fixed Parts
(Intercept) / 2.89 / 2.54–3.24 / <.001
Condition / -0.25 / -0.40–-0.09 / .005
Generation / 0.22 / 0.11–0.33 / <.001
Condition:Generation / -0.01 / -0.08–0.05 / .747
Random Parts
σ2 / 0.323
τ00, Chain / 0.026
τ00, MapCode / 0.146
ρ01 / -1.000
NChain / 51
NMapCode / 8
ICCChain / 0.052
ICCMapCode / 0.295
Observations / 357
R2/ Ω02 / .545 / .542

Supplementary Materials 4: Route Reproduction Accuracy

The raw, untransformed route reproduction accuracy scores were analyzed using non-parametric tests. First, we compared the route reproduction accuracy scores across the Social Coordination and Observation conditions (collapsed over Generations). Route reproduction accuracy scores were significantly higher in the Social Coordination condition (Wilcoxan rank-sum testW= 472, nSocial Coordination= 25,nObservation= 26, p= 0.005 two-tailed). Next, we compared the route reproduction accuracy scores between Generation 1-2 and Generation 7-8 (collapsed across Conditions). Route reproduction accuracy scores were significantly higher at Generation 7-8 (Wilcoxon testV= 52, p 0.001 two-tailed).

These findings confirm those returned by the linear mixed effects modeling using the logit transformed route reproduction accuracy scores.

Supplementary Materials 5: Cultural Inheritance of Route Reproduction Accuracy

The route reproduction accuracy scores (logit transformed) were entered into a linear mixed effects model, with a maximal random effects structure as specified below:

lmer (LogitPerformNplus1 ~ Condition *LogitPerformN + (1 | Chain) +
(1 + Condition | Map), data= PerformNplus1, REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015):

Generation N-plus-1 Route Reproduction Accuracy (Logit)
B / CI / p
Fixed Parts
(Intercept) / 2.18 / 1.83–2.53 / <.001
Condition / -0.19 / -0.35–-0.04 / .022
Generation N / 0.19 / 0.09–0.28 / <.001
Random Parts
σ2 / 0.418
τ00, Chain / 0.000
τ00, MapCode / 0.134
ρ01
NChain / 51
NMapCode / 8
ICCChain / 0.000
ICCMapCode / 0.242
Observations / 306
R2/ Ω02 / .300 / .299

Supplementary Materials 6: Instruction-Giver Total Words

The total number of words used to communicate the routes was higher for Instruction-Givers in the Social Coordination condition compared to the Observation condition (see SM6 Figure below).

SM6 Figure. Total number of words used to communicate the routes over Generations by Instruction-Givers and Instruction-Followers in the Social Coordination condition, and by Instruction-Givers in the Observation condition. The blue straight line is the linear model fit and the light grey shaded area is the 95% confidence interval.

The total number of Instruction-Giver words were entered into a linear mixed effects model, with a maximal random effects structure as specified below:

lmer (Tokens ~ Condition * Generation + (1 + Generation | Chain) +
(1 + Condition * Generation | Map), data= DirectorTOTALWords, REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015). The tabled results were confirmed by model comparison.

Total Instruction-Giver Words
B / CI / p
Fixed Parts
(Intercept) / 225.15 / 210.70–239.61 / <.001
Condition / -50.36 / -75.96–-24.77 / <.001
Generation / 5.68 / -0.68–12.04 / .091
Condition:Generation / -2.71 / -13.04–7.62 / .611
Random Parts
σ2 / 4389.342
τ00, Chain / 1484.133
τ00, Map / 102.996
ρ01 / -0.145
NChain / 51
NMap / 8
ICCChain / 0.248
ICCMap / 0.017
Observations / 357
R2/ Ω02 / .574 / .535

Supplementary Materials 7: Positive and Negative Token Density

The density of positively- and negatively-biased route description tokens (positively- and negatively-biased words, including repetitions, as a percentage of total Instruction-Giver words) were entered into a linear mixed effects model. The model with the maximal random effects structure failed to converge. The model converged after the random effects structure was simplified, by removing token polarity from the random effects structurefor Map:

lmer (Percentage ~ Condition * GenerationC * Polarity + (1+ Generation * Polarity |Chain) + (1+ Condition * Generation | Map), data=PosNegPolarityDensity, REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015):

Positive and Negative Token Density
B / CI / p
Fixed Parts
(Intercept) / 18.37 / 17.62–19.12 / <.001
Condition / -0.80 / -1.90–0.30 / .161
Generation / 0.24 / -0.01–0.49 / .065
Polarity / -19.28 / -20.80–-17.75 / <.001
Condition:Generation / -0.16 / -0.66–0.34 / .532
Condition:Polarity / 3.50 / 0.45–6.55 / .028
Generation:Polarity / -1.83 / -2.47–-1.20 / <.001
Condition:Generation:Polarity / -0.18 / -1.45–1.08 / .776
Random Parts
σ2 / 30.576
τ00, Chain / 1.802
τ00, Map / 0.548
ρ01 / 0.005
NChain / 51
NMap / 8
ICCChain / 0.055
ICCMap / 0.017
Observations / 714
R2/ Ω02 / .806 / .806

Supplementary Materials 8: Positive and Negative Token Density interactions

Token Polarity by Generation interaction. To understand the Token Polarity by Generation interaction effect, the change over generations in the density of positively-biased tokens and negatively-biased tokens was analyzed in separate linear mixed effects models (collapsed across Conditions). Each model included a maximal random effects structure as specified below:

lmer (Percentage ~ Generation + (1 + Generation |Chain) + (1+ Generation |Map),

data= PositiveDensity, REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015). The tabled results were confirmed by model comparison.

Positive Token Density / Negative Token Density
B / CI / p / B / CI / p
Fixed Parts
(Intercept) / 27.95 / 24.83–31.07 / <.001 / 8.80 / 6.74–10.86 / <.001
Generation / 1.02 / 0.58–1.45 / <.001 / -0.55 / -0.79–-0.31 / .002
Random Parts
σ2 / 31.691 / 7.301
τ00, Chain / 14.377 / 1.277
τ00, Map / 17.261 / 8.498
ρ01 / 0.112 / -0.734
NChain / 51 / 51
NMap / 8 / 8
ICCChain / 0.227 / 0.075
ICCMap / 0.273 / 0.498
Observations / 357 / 357
R2/ Ω02 / .675 / .661 / .687 / .682

The Token Polarity by Generation interaction is explained by the increase in Positive terms over generations and the decrease in Negative terms over Generations (Social Coordination and Observation conditions).

Condition by Token Polarity interaction. To understand the Condition by Token Polarity interaction effect, we compared the Conditions (Social Coordination, Observation) for Positive Token Density and for Negative Token Density in separate independent samples t-tests (collapsed across Generations). Positive token density was higher in the Social Coordination condition (M= 29.30, SD= 8.48) compared to the Observation condition (M= 26.74, SD= 8.69), t(49)= 2.111, p= 0.039. Negative token density was lower in the Social Coordination condition (M= 8.34, SD= 4.34) compared to the Observation condition (M= 9.18, SD= 4.53), t(49)= -2.131, p= 0.038.

Supplementary Materials 9: Cultural Inheritance of Positively-Biased Route Description Tokens

The density of positively-biased and negatively-biased route description tokens at Generation N-plus-1 were entered into a linear mixed effects model, with a maximal random effects structure as specified below:

lmer(DensityNplus1 ~ DensityN* Condition * Polarity + (1 + Polarity | Chain) +
(1 + Condition * Polarity | Map), data= PosNegPolarityDensityNplus1, REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015):

Cultural Inheritance: Positive and Negatively-Biased Route Description Tokens
B / CI / p
Fixed Parts
(Intercept) / 14.94 / 13.53–16.34 / <.001
Token Density at Generation N / 0.14 / 0.06–0.21 / <.001
Condition / -0.42 / -2.89–2.05 / .740
Polarity (Positive, Negative) / -13.72 / -19.35–-8.09 / <.001
Density:Condition / 0.03 / -0.11–0.18 / .646
Density:Polarity / -0.20 / -0.34–-0.06 / .005
Condition:Polarity / 0.06 / -5.09–5.22 / .981
Density:Condition:Polarity / 0.18 / -0.10–0.46 / .209
Random Parts
σ2 / 22.083
τ00, Chain / 1.045
τ00, Map / 0.913
ρ01 / -1.000
NChain / 51
NMap / 8
ICCChain / 0.043
ICCMap / 0.038
Observations / 612
R2/ Ω02 / .859 / .859

Supplementary Materials 10: Cultural Inheritance of Positively-Biased Route Description Tokens: Density by Polarity Interaction

To understand the Token Density by Polarity interaction effect, the influence of Generation N positively-biased tokens and negatively-biased tokens were analyzed in separate linear mixed effects models (collapsed across Generations). Each model included a maximal random effects structure as specified below:

lmer (DensityNplus1 ~ DensityNpositive + (1 | Chain), data= DensityPositive, REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015). The tabled results were confirmed by model comparison.

Generation N-Plus-1 Positive Token Density / Generation N-Plus-1 Negative Token Density
B / CI / p / B / CI / p
Fixed Parts
(Intercept) / 20.65 / 17.52–23.79 / <.001 / 9.01 / 7.95–10.06 / <.001
GenerationN Token Density / 0.28 / 0.17–0.39 / <.001 / -0.07 / -0.17–0.04 / .200
Random Parts
σ2 / 64.948 / 16.749
τ00, Chain / 1.546 / 0.257
NChain / 51 / 51
ICCChain / 0.023 / 0.015
Observations / 306 / 306
R2/ Ω02 / .132 / .129 / .070 / .033

Supplementary Materials 11: Social Interaction

The ratio of Instruction-Giver to Instruction-Follower packets sent was entered into a linear mixed effects model, with a maximal random effects structure as specified below:

lmer (LogitPerform ~ RatioGiver.to.Follower + (1 | Chain) +(1 | Map), data= TurnRatio, REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015):

Route Reproduction Accuracy (Logit)
B / CI / p
Fixed Parts
(Intercept) / 2.64 / 2.38–2.91 / <.001
Ratio of Instruction-Giver to Follower Interaction / 0.44 / 0.12–0.76 / .008
Random Parts
σ2 / 0.485
τ00, Chain / 0.018
τ00, Map / 0.122
NChain / 25
NMap / 8
ICCChain / 0.029
ICCMap / 0.195
Observations / 175
R2/ Ω02 / .316 / .303

Supplementary Materials 12: Social Interaction

There was no statistical evidence of a change in the ratio of Instruction-Giver to Instruction-Follower social interaction over generations in the Social Coordination condition (see SM12 Figure below).

SM12 Figure. Instruction-Giver to -Follower social interaction (ratio of Instruction-Giver packets sent to Instruction-Follower packets sent) over Generations in the Social Coordination condition. The blue straight line is the linear model fit and the light grey shaded area is the 95% confidence interval.

The ratio of Instruction-Giver to Instruction-Follower packets sent was entered into a linear mixed effects model, with a maximal random effects structure as specified below:

lmer(RatioGiver.to.Follower ~ Generation + (1 | Chain) + (1 | Map), data= TurnRatio,REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015). The tabled results were confirmed by model comparison.

Ratio of Instruction-Giver to -Follower Packets Sent
B / CI / p
Fixed Parts
(Intercept) / 0.59 / 0.54–0.64 / <.001
Generation / 0.02 / -0.01–0.04 / .146
Random Parts
σ2 / 0.106
τ00, Chain / 0.000
τ00, Map / 0.000
NChain / 25
NMap / 8
ICCChain / 0.000
ICCMap / 0.000
Observations / 175
R2/ Ω02 / .012 / .012

Supplementary Materials 13: Packet Size

The mean Instruction-Giver Packet Size (in words) was entered into a linear mixed effects model. The model with the maximal random effects structure failed to converge. The model converged after the random effects structure was simplified, by removing Generation from the random effects structurefor Map:

lmer(MeanPacketSize ~ Condition * Generation + (1 + Generation | Chain) +
(1 + Condition | Map), data= TurnLength, REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015):

Mean Instruction-Giver Packet Size
B / CI / p
Fixed Parts
(Intercept) / 19.51 / 18.42–20.61 / <.001
Condition / 11.40 / 9.21–13.60 / <.001
Generation / 2.77 / 2.18–3.36 / <.001
Condition:Generation / 5.19 / 4.01–6.37 / <.001
Random Parts
σ2 / 68.957
τ00, Chain / 6.189
τ00, MapCode / 0.000
ρ01 / 1.000
NChain / 51
NMapCode / 8
ICCChain / 0.082
ICCMapCode / 0.000
Observations / 357
R2/ Ω02 / .640 / .638

Supplementary Materials 14: Packet Size

The mean Instruction-Giver packet size data was analyzed in separate linear mixed effects models for the Social Coordination condition and for the Observation condition. A maximal random effects structure was specified:

lmer(MeanPacketSize ~ Condition * Generation + (1 + Generation | Chain) +
(1 + Condition | Map), data= MapTask, REML=FALSE)

The linear mixed effects model for the Observation condition after adding a quadratic term for Generation to the model is given below:

lmer (MeanPacketSize ~ Generation + Generation2C + (1 + Generation | Chain) +
(1 + Generation2 | Chain) + (1 + Generation | Map) + (1 + Generation2C | Map),
data= MapTask, REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015):

Social Coordination Condition / Observation Condition / Observation Condition with Quadratic term
B / CI / p / B / CI / p / B / CI / p
Fixed Parts
(Intercept) / 13.70 / 12.57–14.83 / <.001 / 25.10 / 23.19–27.02 / <.001 / 25.10 / 23.29–26.92 / <.001
Generation / 0.13 / -0.37–0.62 / .619 / 5.31 / 4.21–6.42 / <.001 / -4.58 / -7.22–-1.94 / <.001
Generation^2 / 1.24 / 0.89–1.58 / <.001
Random Parts
σ2 / 42.772 / 88.114 / 59.119
τ00, Chain / 2.225 / 12.130 / 0.000
τ00, MapCode / 0.000 / 0.000 / 13.305
ρ01 / -1.000 / 1.000
NChain / 25 / 26 / 26
NMapCode / 8 / 8 / 8
ICCChain / 0.049 / 0.121 / 0.000
ICCMapCode / 0.000 / 0.000 / 0.183
Observations / 175 / 182 / 182
R2/ Ω02 / .184 / .097 / .665 / .662 / .780 / .779

Supplementary Materials 15: Packet Size at Generation N-plus-1

The mean Instruction-Giver packet size (in words) at Generation N-plus-1 was analyzed in a linear mixed effects model for the Observation condition. A maximal random effects structure was specified:

lmer (MeanPacketSizeN+1 ~ MeanPacketSizeN + (1 | Chain) + (1 | Map),
data= TurnLengthTradition, REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015):

Instruction-Giver Packet Size at Generation N-plus-1
B / CI / p
Fixed Parts
(Intercept) / 5.84 / 1.31–10.37 / .012
GenerationN / 1.01 / 0.81–1.20 / <.001
Random Parts
σ2 / 146.895
τ00, Chain / 0.000
τ00, MapCode / 0.000
NChain / 26
NMapCode / 8
ICCChain / 0.000
ICCMapCode / 0.000
Observations / 156
R2/ Ω02 / .397 / .397

Supplementary Materials 16: Instruction-Giver Packet Size and Route Reproduction Accuracy

The route reproduction accuracy scores (logit transformed) were analyzed in a linear mixed effects model for the Observation condition. A maximal random effects structure was specified:

lmer (LogitPerform ~ MeanPacketSize + (1 | Chain) + (1 | Map),
data= TurnLength, REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015):

Route Reproduction Accuracy
B / CI / p
Fixed Parts
(Intercept) / 2.40 / 2.15–2.65 / <.001
Mean Instruction-Giver Packet Size / 0.02 / 0.02–0.03 / <.001
Random Parts
σ2 / 0.377
τ00, Chain / 0.013
τ00, MapCode / 0.107
NChain / 26
NMapCode / 8
ICCChain / 0.025
ICCMapCode / 0.216
Observations / 182
R2/ Ω02 / .439 / .435

Supplementary Materials 17: Instruction-Giver Packet Size at Generation N and Route Reproduction Accuracy at Generation N-plus-1

The mean route reproduction accuracy scores at Generation N-plus-1 (logit transformed) were analyzed in a linear mixed effects model for the Observation condition. A maximal random effects structure was specified:

lmer (LogitPerformNPlus1 ~ MeanPacketSize + (1 | Chain) + (1 | Map),
data= TurnLengthTradition, REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015):

Route Reproduction Accuracy at Generation N-plus-1
B / CI / p
Fixed Parts
(Intercept) / 2.51 / 2.27–2.76 / <.001
Mean Instruction-Giver Packet Size / 0.03 / 0.02–0.04 / <.001
Random Parts
σ2 / 0.334
τ00, Chain / 0.025
τ00, MapCode / 0.096
NChain / 26
NMapCode / 8
ICCChain / 0.054
ICCMapCode / 0.212
Observations / 156
R2/ Ω02 / .430 / .418

Supplementary Materials 18: Predicting Route Reproduction Accuracy from Token Polarity and Instruction-Giver Packet Size

The route reproduction accuracy scores at Generation N-plus-1 (logit transformed) were analyzed in a linear mixed effects model for the Observation condition. First we examined the influence of Token Polarity on Route reproduction accuracy. A maximal random effects structure was specified:

lmer (LogitPerform ~ PostiveToken * NegativeToken + (1 | Chain) + (1 | Map),
data= PosNegPacketPerform, REML=FALSE)

Next, we added Instruction-Giver packet size to the model:

lmer (LogitPerform~ PostiveToken * NegativeToken * MeanPacketSize + (1 | Chain) +
(1 | Map), data= PosNegPacketPerform, REML=FALSE)

The model output is tabled below, using the sjt.glmer() function of sjPlot (Lüdecke, 2015):

Route Reproduction Accuracy at Generation N-plus-1 / Route Reproduction Accuracy at Generation N-plus-1
B / CI / p / B / CI / p
Fixed Parts
(Intercept) / 2.34 / 2.23–2.46 / <.001 / 2.35 / 2.23–2.48 / <.001
Positive Token Density / 0.00 / -0.01–0.01 / .925 / -0.00 / -0.01–0.01 / .901
Negative Token Density / -0.11 / -0.13–-0.08 / <.001 / -0.10 / -0.12–-0.07 / <.001
Positive*Negative Token Density / -0.00 / -0.00–-0.00 / .027 / -0.00 / -0.00–0.00 / .203
Instruction-Giver Packet Size / 0.02 / 0.01–0.02 / <.001
Positive Token Density*Instruction-Giver Packet Size / 0.00 / -0.00–0.00 / .629
Negative Token Density*Instruction-Giver Packet Size / -0.00 / -0.00–0.00 / .471
Positive*Negative Token Density*Instruction-Giver Packet Size / 0.00 / -0.00–0.00 / .228
Random Parts
σ2 / 0.385 / 0.312
τ00, Chain / 0.000 / 0.004
τ00, MapCode / 0.004 / 0.011
NChain / 26 / 26
NMapCode / 8 / 8
ICCChain / 0.000 / 0.013
ICCMapCode / 0.011 / 0.033
Observations / 182 / 182
R2/ Ω02 / .389 / .389 / .518 / .517

References

Lüdecke, D. (2015). sjPlot: data visualization for statistics in social science. R Package Version, 1(4).