SOM A. Objective Functions for Each Model

Supplementary online materials

All codes used in simulation work for this report are available online at Samuel Feng’s website (https://web.math.princeton.edu/~sffeng/ ).

SOM A. Objective functions for each model

Here we discuss the specifics of how control was computed by optimizing an objective function. A different objective function was used for the I/O Matching model versus the DD model. Recall from the main text that N denotes network size (number of relevant pathways), F denotes fan-out, and φ denotes interference.

I/O Matching model

The goal of this model was to see how well our network architecture would match outputs with inputs. As mentioned in the text, within a pathway, the positive connections (e.g. brown lines in Figure 3) had weight +1/2, and the negative cross-connections had weight -1/2. Furthermore, the input units themselves received either +1/2 or -1/2 as input values for the right and left units, respectively. Our choice of +1/2 and -1/2 for these values was simply to yield either +1 or -1 when we take differences between pairs of units, as described in the next paragraph.

What we matched was the differences in the pairs of inputs, to the same difference in the pairs of outputs. More concretely, let be the difference between the pth pair of input units, and let be the difference in the scaled outputs of the pth pair of output units, using Eq. 1 and Eq. 2. The objective function that we minimized to obtain the values for the control policy κ = (k1, k2, …, kN) was the sum-squared difference between the control scaled output and the input:

Eq. S1

DD model

For the DD model, we used inputs drawn from a uniform distribution, rather than the +1/2 and -1/2 used for the I/O matching model. The activity of the output units was then computed in exactly the manner as with the I/O Matching model described above. They key addition is that now each pair of output units then drives a drift diffusion process with drift rate equal to the difference in activity between the output units. More precisely, each pair of output units is associated with a drift diffusion process with drift Op.

Using equations from Bogacz et al.(2006) we computed the reward rate of the associated DD process for each pair of output units, which we denote by RRp. Reward rate was defined as the proportion of correct trials divided by the average time between decisions:

where ER is the error rate, DT is the mean decision time, D is a delay between decisions, and Dp is an additional penalization delay imposed for incorrect responses. As mentioned below in “Other parameter values” we set D = 0.5 sec and Dp = 1.5 sec, and our results were not sensitive to these exact values.

We computed the control policy by minimizing the sum of scaled reward rates across all of the relevant pathways,

. Eq. S2

Note from the main text that implicitly the threshold level is being optimally set, again, using techniques developed by Bogacz et al. (2006).

In this development, we used the free response protocol, but the same formulation can be carried out using the interrogation protocol. We did not see significantly different results when we compared the two protocols.

B. Extraneous model parameters and settings

In order to ensure that the extraneous parameter values and settings we chose for our main DD model simulations were reasonable, a large set of extra simulations was run in order to examine the behavior of the DD objective function under perturbations of these other parameters. In order to study these effects, we sometimes needed to examine the model behavior at a finer grain of detail than just simply computing K. To this end, for each simulation trial, we first sorted computed control policy and then normalized the control policy so that the largest element was 1. Then, across simulation trials, we averaged these sorted control policies, yielding a sorted control vector, denoted by . For results reported in this section, each curve represents an average over 1000 numerical simulations.

Methods for creating irrelevant pathways

In the main text, we reported that our model architecture formed irrelevant pathways by choosing for each pair of input units F (fan-out) random output unit pairs to project to. In total, we considered 3 such ways to connect inputs to outputs:

Fixing the number of irrelevant connections for input pairs to F and randomly choosing connected output pairs (reported in main text)
Fixing the number of irrelevant connections for output pairs to F and randomly choosing connected input pairs
Choosing the “neighboring” F pairs of output units for each pair of input units to project to, cycling around the network if necessary (cyclic)

Unsurprisingly, each of these different methodologies for introducing irrelevant pathways (and thus cross-talk) did not yield significantly different results. See Figure S1.

Free response vs. interrogation protocol

With the DD there are two main paradigms with which to use it. One is the free response protocol, and the other is the interrogation protocol, both of which are extensively studied in Bogacz et al. (2006). For our main results we used the free-response protocol but wanted to ensure the results also held for the interrogation protocol. We see in Figure S2 the results of this comparison, for two levels of interference (these were old simulations, where ρ denotes interference which we call φ in the main text). The two are not significantly different, but the interrogation protocol exhibited slightly lower levels of control (i.e. a slightly greater capacity constraint). This supported our choice of the free-response protocol, as we are aiming for a lower bound on these effects.

Distributions of inputs

For our main results, we supposed that each input unit received a input value drawn from a uniform distribution on [0,1]. We also considered some other distributions of inputs to test the sensitivity of our model to this hypothesis. In particular, we also considered

Drawing the difference of the inputs from a uniform distribution on [0,1]
Setting all of the left input units to 1, and the right inputs to 0 (“ones”)
Drawing input units values from a normal distribution, and then taking the absolute value
Drawing the right input units from a “large” uniform distribution on [.9, 1], and the left input units from a uniform distribution on [0, .1].

The results of these considerations are shown in Figure S3. With the exception of the “ones” distribution, there is no distinguishable difference.

Strength of irrelevant pathways

In the main text, the connection strengths of irrelevant pathways was equal to those of the relevant pathways. We also looked into the effects of changing this. We held the connection strengths of the relevant pathways fixed, and varied the irrelevant connections’ strengths. This ratio, which we call the “cross weight ratio”, equals 1 when the relevant and irrelevant connections have the same strength (as is used in the main text), and has a value less than one if irrelevant pathways are weaker than relevant pathways, and greater than one if irrelevant pathways are stronger than relevant pathways. Figure S4 shows the results of these considerations.

Other parameter values

For reference, here we include a listing of all other extraneous parameter values for the results reported in the main text.

DD noise: σ = 1.0
Delay between trials: 0.5 sec
Time delay of incorrect response: 1.5 sec
Nondecision time: 0 sec

C. Simulation procedure and numerical optimization

Our procedure is summarized in Algorithm 1 below, which assembles all of the discussion in Network configuration into a recipe for how to simulate one trial and obtain one sample for the computed optimal vector κ and the number of active control units K (explained in Evaluation of capacity constraints). For the DD model, we elected to use the free response protocol, as the interrogation protocol did not produce noticeably different results (see SOM B above). The procedure outlined is then repeated many times and the κ and K are used to produce averaged values for κ and K. These averaged values are used in presenting model simulation results.

Algorithm 1 (to simulate one trial, obtaining one sample for κ and K)

Require: Network size N, interference level φ, fan-out level F. Also pick if using I/O Matching model or DD model
Draw sample values for each input unit
For each irrelevant pathway, determine if it is incongruent with probability φ
Solve the relevant optimization problem to obtain an optimal control policy κ
Count the number of elements of κ greater than 0.5 to obtain K

The main computational demands in executing Algorithm 1 lie in finding global extrema for our objective functions for the DD model, because the nonlinearities introduced by the reward rate cause the objective function to be a high-dimensional nonconvex optimization problem. Solving these minimization problems demanded the use of a parallelized global optimization in order obtain results in a reasonable amount of time. All codes were written either in MATLAB (Mathworks) or in C. In the earliest versions of the code, we solved our minimization problems by repeated application of MATLAB’s fmincon function with many randomly-chosen initial points. We also evaluated results by using several different algorithms implemented in “The NLopt nonlinear-optimization package” (Johnson, 2012; Conn, Gould, & Toint, 1991; Birgin & Martínez, 2008; Kan & Timmer, 1987; Powell, 2006; Powell 2009]. The majority of results presented here were performed using a parallel implementation of particle swarm optimization (Eberhart & Kennedy, 1995) as implemented in the PSwarm Solver developed by Vaz & Vicente (2009). We did check the results found with PSwarm against those of the earlier simulations and found no difference in the resulting solutions. In all numerical algorithms used, we further ensured, either through initializing a particle or as an initial starting point, that the algorithm would check a few specific test cases in addition to the default implementation. Initial points tested always included all units being fully active, no units active, and each individual unit is active (with all others set to 0). These conditions were always evaluated in an attempt to prevent the optimization package from overlooking potentially obvious solutions, and to provide some structural information about the reward surface which might otherwise be arbitrarily ill conditioned. There are a massive number of options and parameters with which ran simulations (e.g. DD protocol, noise, network connectivities, input distributions). While we are unable to report on all of these here, we have attempted to present a thorough representation of the main behaviors and findings of our model.

D. Sensitivity analysis for sub-optimal performance

One important consideration is the robustness of the effects we observed, and in particular how tolerant performance is to variations in control. For example, although a limit in the size of K (number of active control units) may be optimal, increasing K beyond that limit may be associated with only small costs to performance, thus questioning the importance of this limit. To assess this possibility,we examined the effect that perturbations in K around the optimal value had on performance in both the I/O Matching and DD models. In both cases, we used networks with 100 pathways, 20% connectivity, and all other parameters as in the main text. After computing the optimal value of K, we then increased or decreased this and examined the effect on performance. In both cases, our manipulations were conservative: when increasing K, we set control units that were deactivated (<0.5) to 0.5, thus providing the minimal amount of control to be considered activated. Conversely, when decreasing K, activated units were set to 0.49, the maximum value still considered to be deactivated. We also investigated fully activating and deactivating units (by setting their values to 1 or 0, respectively), and observed qualitatively similar (though slightly more extreme) effects. The results of this sensitivity analysis are presented in Figure S5 panels A and B, for the I/O Matching and DD models, respectively. They are similar for both models. In both cases, maximum performance occurs for the value of K that had been determined to be optimal (providing encouraging validation of the optimization procedures). Critically, even small perturbations in this value yield sizable decreases in overall performance (higher error for the I/O Matching model and lower reward rate for the DD model). For example, in the DD model, activating one additional control unit produced a 20% decrease in overall reward rate. Furthermore, Figure S5 shows that increasing the number of active control units is more detrimental than decreasing it. These results suggest that optimizing the number of active control units has a meaningful impact on performance, and that a conservative policy should favor limiting rather than licensing of multitasking.

We also repeated the above analysis for the I/O Matching model using a different objective function. For this analysis, we minimized the sum of the absolute values of the differences, as opposed to the sum squared differences as in Eq. S1:

Eq. S3

Using this as the objective function for the I/O Matching model, we repeated the analysis described in the above paragraph, and plotted the results in Figure S5, panel C. Observe that the change in objective function seems to increase linearly with the size of the perturbation. Still, the same claims as above hold: even small perturbations yield sizeable changes in objective function value, and increasing the number of active control units is more detrimental than decreasing it.

E. Additional analysis on the nature of interference

A primary finding of our study is the strikingly sublinear manner with which the optimal control policy (K) scales with network size. To further examine the source of this effect, we conducted additional simulations implementing variants of the I/O Matching model. Unless mentioned otherwise, these used interference, network connectivity, and other parameters that were the same as those reported in the main text.

Specifically, we sought to isolate the effects of control in the input vs. output layers of the network. Thus, we began by implementing two variants of the model, one in which control influenced only the input layer of the model (I model), and another in which it influenced only the output layer (O model). As for the full model, we determined the optimal value of K for each. Figure S6 shows that the O model exhibits the same strong sublinearity as the full I/O Matching model. In contrast, if control is restricted to the input layer (I Model), then there is near-linear scaling of optimal K with network size. Thus, it seems clear that the sublinearity arises primarily from control on the output layer of the model. Note, that the performance of both variants is substantially worse than the full model (see Figure S7) for all network sizes, and in fact this difference grows with network size. This confirms that coordinated control of both the input and output of a pathway is the optimal policy, and that this policy favors restricted multitasking even at large network sizes.

To further understand why control at the output level so strongly favors restricting the number of pathways, even as network size increases, we compared the control policies and effects of cross-talk in the full and I models. First, for each network size, we determined the optimal K for the I (Ki) and full (Kio) models. Consistent with the (greater) sublinearity of the full model, Kio was less Ki than for all network sizes (see Figure S6). We then carried out the following two sets of modifications at each network size. For the first, we swapped the activities of the control units in each model those from the other, by ranking the activity of each control unit in each model, and then assigning to it the activity of the control unit with the corresponding rank in the other model. For the second set of modifications, we endeavored to set the number of active control units in each model equal to the number that was optimal for the other, using as conservative a procedure as possible. We began by taking the difference Kd = Ki - Kio. Then, in the I model, from the pool of activated control units, we de-activated Kd of the least active of these (setting their value to 0.49), so that the number of activated control units was now equal to Kio. Finally, in the full model, from the pool of deactivated control units, we activated Kd of the most activated of these (setting their value to 0.5), so that the number of control units was now equal to Ki. For example, at N=30, the optimized I model had 3 more activated units than the full model (i.e., Kd = 3). So, of the activated control units in optimized the I model, we de-activated the three least activated of these (to match the number in the optimized full model); and of the de-activated control units in the full model, we activated the three most activated of those (to match the number in the optimized I model).

The results of these manipulations are shown in Figure S8. For reference, the solid lines show performance of each model optimized for different network sizes (blue line for the full model and green line for I model). The dotted lines show performance for the first set of modifications, in which the control policies were swapped; and the dashed lines show the effects of the second set of modifications, in which the number of active control units in each model was set to the number in the other. It is clear from this figure that there is a substantial asymmetry in the effects of all of the modifications, such that they have a greater impact on the full model than the I model, bringing performance of the full model closer to the I model than vice versa. These effects are illustrated in Figure S9, which plots the change in performance for the modifications to each model. Note that the I/O model suffers much greater degradations due to activating additional control units than the I model does when control units are deactivated.