6

Electronic Supplementary Material ESM1: Methods

1.  Training: Agent navigation

Prior to the learning experiments, agents were trained to navigate the same arena without patches. The objective of the training was to find values for the weights of the neuro-controller (NN) that promote the exploration of the environment by the agents, while avoiding collisions. This allowed assigning further changes in their locomotion behavior to our experimental scenarios. The NN used for the navigation behavior was a multiple layer perceptron (input layer: 8, hidden layer: 10, output layer: 2 neurons) governing the speed of the two locomotor appendages or wheels. This NN had an additional layer, compared to the one used in Acerbi et al. (2007). We decided to use this configuration based on a previous work from our group in which the performance of agents carrying different NN were tested on their ability to evade obstacles and collisions (not published).

A population of 40 agents (radius = 1) was placed in the environment (400x400) with random position and orientation and initial energy of 1000 energy units. The weights were binary coded and initialized randomly from a uniform distribution in the range [-2.0–2.0]. This range was selected according to preliminary tests. The environment had circular obstacles (radius = 3) and walls. Note that we decided to remove walls from our experiment on learning behavior to rule out a possible border effect, but this did not change our results (ESM2 Fig. 3). Collisions against walls, other agents or obstacles affected agent energy budget negatively in -1 energy unit per collision. Agents were evaluated on their ability to move and avoid obstacles in four trials of 500 iterations each. For each trial, agents were randomly placed in the environment with random orientation. After the four trials, the best 10 agents (truncate selection) were allowed to reproduce clonally (probability of mutation m = 0.03 per bit) and to generate four offspring each. After reproduction all agents of the parental generation died in order to keep population size constant. This process was repeated for 40 generations. After this artificial selection experiment, the best 10 selected agents were “taught” (supervise learning algorithm) for 1000 iterations to keep straight movement when there was no detection from their sensors. Finally, the weights of the NN were stored in an output file. These NN were used during the learning experiments to govern the navigation behavior of the mobile agents.

2.  Asocial and social learning: Learning algorithm

At the beginning of each simulated experiment, all the agents of the social group were naïve and were able to learn to choose what patch to select relying on personal information only. This was implemented in the model as follows: the weights (wi) or free parameters in the perceptron governing decisions based on ground color were initialized during each trial with a non-preference for any patch (the agents were naïve and tended to evade both patches with the same strength). When an agent was located in a given patch (black or white), a learning algorithm (1) was applied, reinforcing the preference of the agent for that patch (positive feedback), regardless the color or energy consequence, through the modification of the weights (wi). Positive feedback meant that the more an agent experienced a patch, the more likely it was that the agent could choose to remain in that patch. Agents could visit and experience more than one patch before choosing in what patch to stay. Thus, in contrast to previous models (Wakano and Aoki 2006; Aoki and Nakahashi 2008; Kobayashi and Wakano 2012), in our model AL agents could learn wrong behaviors (Rendell et al. 2010; Arbilly and Laland 2014; Dridi and Lehmann 2014). The learning algorithm (1) calculated the new weights (w’i) according to a given learning rate η which was set for asocial and social learning to η = 0.1 and η = v0.1 respectively, as in Acerbi et al. (2007). In social learning, η was modulated linearly in the range [0.1–1.0] by the number of agents v located in the same patch including the focal agent. Therefore, an agent tended to acquire a given behavior with high probability if surrounding individuals were showing this behavior. In this model, observing an agent nearby was equivalent to perceiving its behavior. If two agents were in the same patch but outside visual range, they were still able to perceive each other, for example by auditory or olfactory cues. Additionally, as in Acerbi et al. (2007) and Katsnelson et al. (2011), observer’s copying decisions were not based on behavioral outcomes of demonstrators. When an agent died, the weights in the perceptron of newborn agents were reset to non-preference (see ESM1 Table 1 for more information on parameter values).

w'i=wi+ η.δdfzdzxi (1)

Where, f (z) is the sigmoid function (2) used as the activation function for the backpropagation algorithm and z is the value of the output neuron before activation (net amount of excitation of the output neuron). This is a common algorithm used for supervised learning of classification tasks in artificial intelligent (Michie et al. 1994; Rojas 1996). The sigmoid works as a threshold function and returns a continuous value in the range (0–1). The parameter xi is the input of neuron i (i = 1, 2; two neurons, one per patch). The input neuron xi send an impulse (i.e. xi = 1) when the agent is located in the corresponding patch i. The parameter δ = τ - y is the error of the output neuron when considering the “target” τ and the activated output of the output neuron y (y = f (z)). Here, τ was set so as to increase the preference (positive feedback) for current patch regardless the energy consequences (supervised learning, Rojas 1996). Thus, the more an agent experienced a patch, the more likely the agent was to decide that patch. Agents needed to learn where to stay. Asocial and social learning differed only in the learning rate η when the social environment influenced the learning ability of a social learner agent.

f z=11-e-z (2)

e is the exponential function.

The simulation was developed in C++, using object-oriented programming techniques and the compiler gcc of Ubuntu v12.04 LTS

Table 1 Important abbreviations and symbols, their description and range of values (when applicable)

Abbreviations and Symbols / Description / Range
AL / Asocial learning
SL / Social learning
Mx / Mixed population: AL and SL
IB / Innate behavior
RB / Random behavior
N / Population size / 10
VL / Velocity of left wheel / [-10–10]
VR / Velocity of right wheel / [-10–10]
m / Mutation probability of neural networks (during the training phase and for the weights in IB) / 0.03
η / Learning rate / 0.1
wi / Weights or free parameters / [-3.0–3.0]
v / Number of agents in same patch / [1–10]
ω / Evolving trait: type of learning / 1 or 0
xi / Input neuron i (i = 1, 2) / 1 or 0
μ / Mutation probability of ω / 0.1
f(z) / Activation function (sigmoid) / (0‒1)
z / Output neuron before activation / (-3‒3)
y / Output neuron after activation / (0, 1)
δ / Error of the output neuron / (-1‒0)
Kw / Critical capacity of white patch / 0 or 9
Kb / Critical capacity of black patch / 0 or 9

References

Acerbi A, Marocco D, Nolfi S (2007) Social facilitation on the development of foraging behaviors in a population of autonomous robots. In: Proceedings of the 9th European conference on Advances in artificial life. Springer-Verlag, Lisbon, Portugal, pp 625–634

Dridi S, Lehmann L (2014) On learning dynamics underlying the evolution of learning rules. Evol Learn 91:20–36. doi: 10.1016/j.tpb.2013.09.003

Aoki K, Nakahashi W (2008) Evolution of learning in subdivided populations that occupy environmentally heterogeneous sites. Theor Popul Biol 74:356 – 368. doi: http://dx.doi.org/10.1016/j.tpb.2008.09.006

Arbilly M, Laland KN (2014) The local enhancement conundrum. Theor Popul Biol 91:50–57. doi: 10.1016/j.tpb.2013.09.006

Dridi S, Lehmann L (2014) On learning dynamics underlying the evolution of learning rules. Evol Learn 91:20–36. doi: 10.1016/j.tpb.2013.09.003

Katsnelson E, Motro U, Feldman MW, Lotem A (2011) Evolution of learned strategy choice in a frequency-dependent game. Proc R Soc Lond B Biol Sci. doi: 10.1098/rspb.2011.1734

Kobayashi Y, Wakano JY (2012) EVOLUTION OF SOCIAL VERSUS INDIVIDUAL LEARNING IN AN INFINITE ISLAND MODEL. Evolution 66:1624–1635. doi: 10.1111/j.1558-5646.2011.01541.x

Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine Learning, Neural and Statistical Classification. Ellis Horwood

Rendell L, Boyd R, Cownden D, et al (2010) Why Copy Others? Insights from the Social Learning Strategies Tournament. Science 328:208–213. doi: 10.1126/science.1184719

Rojas R (1996) Neural Networks - A Systematic Introduction. Springer-Verlag, Berlin

Wakano JY, Aoki K (2006) A mixed strategy model for the emergence and intensification of social learning in a periodically changing natural environment. Theor Popul Biol 70:486 – 497. doi: http://dx.doi.org/10.1016/j.tpb.2006.04.003