Disease Dynamics Algorithm Synopsis
Overview
The algorithms described in the following sections aim to explore the relationships between disease dynamics and the underlying social dynamics of the contact network on which the disease spreads. By simulating a population that grows and changes according to salient real-world statistical rates (for New York City, circa 1998[1]) and that is, itself, a conglomerate of family groups, working groups, school groups, and individuals, we have the ability to examine the topological properties of the contact network at any given time during the spread of a disease, which provides us unique insight into how and why the social topology of the contact network influences the propagation of the disease through the population.
Algorithms
The full algorithm consists of two subsets: social processes and disease processes. In general, the social processes are updated on a yearly basis, while the disease processes are updated on a weekly basis (i.e. 52 times for each social update); however, with a few exceptions, updating of the social and disease processes is done simultaneously and synchronously for all individuals in the population. Although adjustments could be made on a by-disease basis, for computational ease, all social processes that do not directly and/or immediately affect connectivity among children (i.e. marriages, formation of/changes to workplaces) are updated in advance of the disease update in the current version of the algorithm. The justification for this protocol in the case in question (measles) is that aggregation of adults is unlikely to have a significant impact on the ability of measles to spread through the population, since most adults are immune to the disease[2, 3].
The basic process of spread on the contact network follows the SIR model with age-defined transmission rates (really the product of a transmission probability and a contact rate). The disease is assumed to pass from infected individuals to susceptible individuals via social links. For measles, the infectious period is assumed to be two weeks.
Social Processes Algorithms
- Birth: Women between the ages of 15 and 45 are eligible to have children, and the number of children born to women of each age iin this range is determined according to a rate per 1000 women of age i. The week at which each baby will be added to the population is randomly chosen, as are the women who will become new mothers.
- Death: The number of people of age i who will die in the current year is determined according to a rate per 1000 people of age i. Individuals are randomly selected for death (unless their age exceeds 95, in which case they are automatically removed from the population), and the week of their removal is randomly determined.
- Age Distribution and Aging: The initial age distribution was adapted from [1], and has been linearized in order to determine values for each age between 0 and 95 (Figure 1). The age of each individual in the population is incremented yearly. For simplicity, all individuals age simultaneously.
Figure 1: Population distribution by age in number per 1000.
- Immigration: The immigration rates were adapted from [4], and have been linearized in order to find a rate for each age between 0 and 95. The number of people of age i who will be added to (or subtracted from) the population at time t is equal to the product of the number of people of age i-1 at time t-1and the immigration rate for age i:
.
The rates of [4] have been further adapted to satisfy the condition that the age distribution is stable, although the total population is able to grow or shrink (current inputs yield annual growth of roughly 0.6%); i.e. the ratio of the number of people moving into age group i from age group i-1, minus the number of people moving into age group i+1 from age group i,minus the number of people of age i who die in the current year, plus (minus) the number of people who immigrate to (from) age group i in the current year to the total population (divided by 1000) is a constant for each age group i:
.
In order to make these rates applicable to many large populations (and not solely to New York City), we found a set of characteristic rates by imposing a threshold of .01 on the derived values; rates whose absolute value was less than .01 were assumed to be 0. This cutoff has the effect of limiting in-migration to age groups i=20 and older. We see predominantly in-migration of individuals in their 20’s and early 30’s (possibly young professionals moving to a large city for employment), out-migration of individuals in their late 30’s through their late 60’s (possibly more (financially) well-established people moving to the suburbs—also reflected by an out-migration of (their) children, aged 11-15), and finally an in-migration of the very elderly (possibly people moving into retirement homes in the city proper). It is required that individuals under the age of 18 leave the population with a family group, and to the extent that it is possible to satisfy the immigration-by-age distribution, entire family groups are moved out of the population if any family member is randomly chosen as an emigrant. The week at which an individual will enter or leave the population is randomly determined; however, if the individual is moving with a family, his or her entire family group will move at the same time.
- Marriages and Families: Approximately 54% of the population over the age of 18 will be paired with a person of the opposite gender at any given time. Marriage is not necessarily correlated with breeding in this simulation, since single women can have children. Family groups are fully connected and remain so until the children turn 18, at which point only the mother and father will remain connected. Because it occurs very infrequently, children whose parents die are not reassigned to new families. The average family size is 3-4.
- School Groups: All children between the ages of 6 and 18 are included in the school subnetwork. This subnetwork consists of fully-connected, age-specific classes of maximum size equal to 50, that are, in turn, interconnected randomly (from a truncated normal distribution) according to the following rule: children between the ages of 6 and 13 can have a maximum of 0.5*(size of their class) connections to other classes in this age range; children between the ages of 14 and 18 can have a maximum of 0.75*(size of their class) connections to other classes in this age range; all children have at least .25*(size of their class) connections to other classes. Each year, school edges are severed for an eight week period and are then reestablished, to simulate summer vacation and the beginning of a new school year (i.e. school-term forcing).
- Workplaces: Individuals have the option to enter the workforce at age 18, but must exit from the workforce at age 65. Each year, approximately 8% of current 18-year olds will remain unemployed, and all others will enter the workforce, if college has not been selected as an option for the population. If college has been selected as an option for the population approximately 28% of people between the ages of 18 and 25 will be in college at any given time, and therefore, 18-year olds will be added to maintain the colleges’ capacities; of the remaining 18-year olds, 8% will be unemployed, and the rest will enter the workforce. The initial number of workplaces is approximately 1% of the total population size, and is allowed to grow over time. Connectivity in the workplace is such that if there are more than 6 people in the initial workplace, the initial workplace is connected as a Barabasi-Albert (BA) graph with seed size 7 (where the seed begins as a graph of 7 nodes and no edges), and if the initial size is greater than 3, but less than 7, the initial workplace is connected as a BA graph with seed size equal to the workplace size. As new workers are added to the workplaces, they are attached randomly to a minimum of three and a maximum of all other workers in the workplace.
- Colleges (optional): Colleges are an optional addition to the social network simulation, and are absent from the current round of measles simulations because of their (likely) small impact on the spread of measles through the population. However, the option exists to choose the number of colleges to include in the population, and from there, to establish subnetworks whose size will be equal to the total population size, divided by (100*number of colleges). Colleges are always maximally filled with individuals between the ages of 18 and 25 and individuals are randomly connected to between 10% and 50% of the other students in their college.
Disease Processes Algorithms
- Initial Immunity (Susceptibility) by Age: The immunity profile (by age) (Table 1) was adapted from [2] and is a set of input parameters at time t=0. The susceptibility of immigrants (i.e. of in-migrants over the age of 20, since there are no in-migrants under the age of 20) is assumed to be higher (3.5%) than it is in the native population (1%). This immigrant-susceptibility percentage is a number that will likely have to be determined on a by-disease basis, since it was derived from measles-specific statistics. First, we findthe fraction of immigrants due to internal migration (to New York City), and the fraction due to immigration from each of the (historical) measles infector regions (Western Europe, Japan, and Africa). We assume that the internal migrants and international migrants from non-infector regions will have the same susceptibility as the native population (by age). To find the susceptible fraction of migrants from each of the infector regions, we multiply the fraction of migrants coming from each region by the vaccine coverage in that region. The aggregate susceptibility of migrants (internal, international from non-infector regions, international from infector regions) is 3.5%.
Table 1: Immunity profile by age.
Age / Fraction Immune0 / 0
1 / .033
2 / .133
3 / .266
4 / .4
5 / .533
6 / .65
7 / .71
8 / .78
9 / .85
10 / .9
11-95 / .99
- Loss of Maternally-Acquired Immunity: All newborns are assumed to be immune to the disease for (in the case of measles) up to six months after the week of birth. The specific week during this nine month period at which maternally-acquired immunity will be lost is randomly chosen from a Gaussian distribution with mean equal to 12 weeks and sigma equal to two weeks.
- Sparking: Because computational restrictions limit us to, at most, Type II populations (100, 000-300,000), it is often impossible to achieve true endemism of the disease in the simulated population. Therefore, to ensure that there is always some chance of infection, we introduce a sparking process, whereby, with some rate (which is chosen as an input parameter), susceptible individuals will become infectious without contact with an already infected individual in the population. The implicit assumption is that the “spark” has traveled outside of the native population, or has otherwise had contact with an infected individual from outside of the native population, has become infected, and has introduced the disease into the native population. The spark is chosen at random from all current susceptibles in the population.
- Force of Infection by Age: The age-specific mixing and transmission (WAIFW) matrix, β, was derived from the force of infection values by age group (Table 2), the aggregate proportion of the infection by age group (Table 2), and the mixing matrix given in[5]:
.
Although [5] reports values for pertussis, the force of infection profile by age is very similar to that of measles, peaking in the 5-10-year old age class. We find the entries of the mixing matrix by solving the equation, , where is the fraction of individuals in age class j who are infected (when endemism is reached) , is the force of infection (per year), is the mixing and transmission coefficient (per week) . If N is the total population and is the fraction of total infections constituted by infection in age class j, and if is the number of susceptible individuals in age class j (where is a rate per 1000 total),then can be expressed as:
. The values for were obtained from [5] and are given in Table 2; the values for were obtained from [1] and are also given in Table 2.
Table 2: By age group: number of susceptibles (at endemism) per 1000 (), aggregate proportion of infected individuals (), fraction of the group that is infected (), force of infection (, per year), mixing and transmission coefficient (, per week). The population has been divided into five age groups (j=1…5): .
Age Class (j) / / / / /1 / 0.00070352 / .66 / 0.093813964 / .22 / 0.030253
2 / 0.0007044 / .3 / 0.042589438 / .48 / 0.129717
3 / 0.00066649 / .025 / 0.003750977 / .18 / 0.024653
4 / 0.00049767 / .005 / 0.001004688 / .04 / 0.005444
5 / 0.00742792 / .01 / 0.000134627 / .045 / 0.006125
Topological Features of the Social Network
- Total Cumulative Degree Distribution:The total cumulative degree distribution covers a range from k=0 (isolated individuals) to (school children with large families), irrespective of graph size, and can be partitioned into two regions (Figure 3). The distribution is dominated by nodes for which k<=50, and this region of the distribution is sharply exponential () to k=22 and virtually flat, thereafter. This indicates that the majority of nodes in the network do not have degree much greater than ~10, since the probability of having k>10 is only about 20%. The largely uniform region between k=22 and k=50 is due predominantly to school connectivity and secondarily to work connectivity and will be discussed in greater detail in later sections. The region of the cumulative distribution to the right of k=50 is separated from the region to the left by a segment that becomes increasingly switch-like as population size grows. This segment is a product of school connectivity and will be discussed in more detail below.
Figure 3: Cumulative total degree distribution. The probability, P(K>k), is plotted for each degree in an averaged set of typical simulated networks of size 100, 000 and 10,000. The distributions are virtually identical, except for the slope of the transition region between k~40 and k~55.
- Cumulative Family Degree Distribution: Family degrees are exponentially distributed () between 1 and 10 (Figure 4). The probability of having a family degree greater than 3 is extremely small (<.06), which implies—since family groups are completely connected, and since the majority of family groups have two parents—that the most common compositions for family groups are couples with one child, single mothers with two children, couples without children, single mothers with one child, couples with two children, and single mothers with three children. Family connections add to the total degree distribution both in the low-degree range for adults and children under the age of six and in the middle/high-degree ranges for school-aged children, whose already high degree (from school connections) is augmented by their family ties.
Figure 4: Cumulative family degree distribution. The probability, P(K>k), is plotted for each degree in an averaged set of typical simulated networks. P(K>0) is plotted as a reference.
- CumulativeSchool Degree Distribution: The cumulative school degree distribution is switch-like in appearance for large populations (100, 000), but decreases more gradually for smaller populations(Figure 5), and can be superimposed on the transition region between 40<k<55 in the total cumulative degree distribution. The difference in behaviour between the school degree distribution of large populations and that of small populations is a result of a combination of factors. Not only will large populations have far more schools (classes) than small populations, but these classes are also far more likely to be filled to capacity (50 students). Therefore, on this basis alone, the average school-aged child will have more school links in a larger population than will a comparably-aged child in a smaller population. In addition, because both the average school (class) size, as well as the number of schools (classes) are greater in the large population, the average inter-school degree is much higher in the large population than it is in the small population (, respectively). The net effect of population size on school degree is to cause the school degree distribution (non-cumulative) to become more and more sharply peaked around k=50.
Figure 5: Cumulative school degree distribution. The probability, P(K>k), is plotted for each degree in an averaged set of typical simulated networks. The switch-like behaviour of the cumulative distribution for the large graph is a reflection of a delta function-like non-cumulative distribution around k=50.
- Cumulative Work Degree Distribution: The cumulative work degree distribution (Figure 6) is exponential (), and ranges between k=1 and k=51, regardless of network size. The probability of k>12 is small (<.2) in the work subnetwork. The average degree is <k>~9.4, and the average clustering coefficient is <C>~.39, which implies that for the typical node, of a possible ~39 connections among its work neighbours, only 15 connections exist. The work network is therefore much less densely connected than the school network and family network, where the average clustering coefficients are nearly one. The work network contributes most heavily to the portion of the total degree distribution to the left of k=50.
Figure 6: Cumulative work degree distribution. The probability, P(K>k), is plotted for each degree in an averaged set of typical simulated networks.
Disease Dynamics
- Susceptibility Over Time: Over time, the fraction of the social network that is susceptible to infection will either grow or fluctuate around its mean starting value (~5%) , depending on the size of the network, on the rate of sparking, and on the transmission (and recovery) rates being used. In general, if the transmission rates are fixed at the values reported earlier, increasing the rates of sparking will counteract the effects of immigration of susceptibles and “disease bypassing”, whereby an individual progresses through school (the primary years of infectivity) without contracting the infection. Since immigration and disease bypassing both tend to deposit susceptibles into the adult age group where transmission rates are low, these processes can result in an accumulation of older susceptibles in the population, and thus in a gradual rise in the susceptible fraction of the population.
- Infection Profiles as a Function of Social Graph Size: While, for all graph sizes, the number of infections on a time scale less than or equal to the infectious period of the disease varies greatly from one simulation to the next (Figure 7a), the aggregate number of infections, when binned in one- or two-month intervals, tends to show increasingly deterministic behaviour both as the size of the population grows and as the rate of sparking increases. For example, at the same sparking rate (~30 sparks/year), it is likely that the infection profiles associated with a population of 100,000 individuals will show much lessvariability both in the temporal patterns of epidemics, as well as in the average size of epidemics, than will the profiles associated with a population 10 times smaller (Figure 7b,c).
Figure 7 : Infection profiles (4 each) for a population of (a) 100,000 individuals on a weekly basis, and (b,c) 100,000 and 10,000 individuals, respectively, on a two-month basis. Random sparking has occurred approximately 30 times per year in each simulation. The interepidemic time ranges from ~1 year to ~3 years for both graph sizes