Appendix 1: R code for simulations

Code for the simulation of the neutral transmission

Iter = #sets the number of simulation iterations#

N = #sets the population size#

mu = #sets the probability of mutation in each iteration#

b <- c(1:N)

V <- matrix(0, N,(Iter-1))

B <- cbind(b, V) # N x Iter matrix that stores the simulation results, rows correspond to entities while columns correspond to simulation time (generation number)

inov = N+1 #initializes the value of the first innovation#

for (m in 2:Iter) {

for(i in 1:N) {

stoh <- runif(1,0,1)

if (stoh < mu){B[i,m]<- inov; inov = inov + 1} else {h <- sample.int(N,1,replace = TRUE);

B[i,m] <- B[h,(m-1)]}

}

}

Code for the simulation of the conformist transmission

Iter = #sets the number of simulation iterations#

N = #sets the population size#

mu = #sets the probability of mutation in each iteration#

conf = #sets the degree of conformism#

b <- c(1:N)

V <- matrix(0, N,(Iter-1))

B <- cbind(b, V) # N x Iter matrix that stores the simulation results, rows correspond to entities while columns correspond to simulation time (generation number)

inov = N+1 #initializes the value of the first innovation#

for (m in 2:Iter) {

vt <- table(B[, m-1])

mod <- as.numeric(names(vt[vt == max(vt)]))

for(i in 1:N) {

stoh <- runif(1,0,1)

if (stoh < mu){B[i,m]<- inov; inov = inov + 1}

else {if (stoh < (mu + conf)) {B[i,m] = mod[sample(1:length(mod), 1)];} else {h <- sample.int(N,1,replace = TRUE);

B[i,m] <- B[h,(m-1)]}}

}

}

Code for the simulation of the anti-conformist transmission

Iter = #sets the number of simulation iterations#

N = #sets the population size#

mu = #sets the probability of mutation in each iteration#

anticonf = #sets the degree of conformism#

b <- c(1:N)

V <- matrix(0, N,(Iter-1))

B <- cbind(b, V) # N x Iter matrix that stores the simulation results, rows correspond to entities while columns correspond to simulation time (generation number)

inov = N+1 #initializes the value of the first innovation#

for (m in 2:Iter) {

vt <- table(B[, m-1])

mod <- as.numeric(names(vt[vt == min(vt)]))

for(i in 1:N) {

stoh <- runif(1,0,1)

if (stoh < mu){B[i,m]<- inov; inov = inov + 1}

else {if (stoh < (mu + anticonf)) {B[i,m] = mod[sample(1:length(mod), 1)];} else {h <- sample.int(N,1,replace = TRUE);

B[i,m] <- B[h,(m-1)]}}

}

}

Appendix 2: mathematical proofs

Proposition 1:

If the population size N is increased while the richness k remains constant (e.g., by adding items carrying one of the variants that are already present in the population), theta estimate based on Eq. 2 (see main text) will decrease. The problem is restricted to assemblages where k > 1 and by implication θ > 0, which means that there is some variation in the population.

Proof:

We can view the addition of p items as a successive addition of one item p times. So if it can be proven that the addition of one item to the assemblage will reduce the previous value of theta, then by induction it follows that after successively adding p items, the resulting theta of the assemblage with the population size of N+p will be lower than the theta for the population size of N.

For assemblage size N, richness k,and θ1we have,:

If N increases by 1, and k remains the same, we then have:

This can be rewritten as:.

From the above it follows that kM, for each θ2 > 0, which implies:

The next step is to prove that the above inequality implies that:

Let us assume that the opposite of what we are trying to prove is true, that. is a monotonically increasing function for all θ, because its first derivative is positive for all valuesθ and all positive values of x (x is, by definition, always positive in this context). This would imply that . Because f(θ) is monotonically increasing for all x, this would also imply that . . . . This would mean that each term on the left side of the inequality would be lower or equal to the corresponding element on the right side, and by implication this would mean that the sum on the left would be smaller or equal to the sum on the right which is in contradiction with the deduced result that the sum on the right must be higher than the sum on the left (see above). Therefore:

This proves that the increase in population size without the increase in richness will decrease θ.

Proposition 2:

If both population size Nandrichnessk are increased by 1 (e.g. by adding a single entity with a new variant to the assemblage) the teestimate derived from Equation 2 (main text) will increase.

Proof:

For assemblage size N, richness k,and θ1 we have:

If both N and k increase by 1 then we have:

Since is always less than 1, and k must always be an integer it follows that:

It can be shown in analogous way as in the proof for the Proposition 1 that the above inequality implies: