Running head: Nonlinearity in demographic models

Nonlinear relationships between vital rates and state variables in demographic models

Johan P. Dahlgren1,3, María B. García2 & Johan Ehrlén1

1 Department of Botany, Stockholm University, SE 10691 Stockholm, Sweden.

2 Instituto Pirenaico de Ecología (CSIC), Apdo. 13034, Zaragoza, Spain.

3 E-mail:

Manuscript type: Note

Abstract. To accurately estimate population dynamics and viability, structured population models account for among-individual differences in demographic parameters that are related to individual state. In the widely used matrix models, such differences are incorporated in terms of discrete state categories whereas integral projection models (IPMs) use continuous state variables to avoid artificial classes. In IPMs, and sometimes also in matrix models, parameterization is based on regressions that do not always model nonlinear relationships between demographic parameters and state variables. We stress the importance of testing for nonlinearity and propose using restricted cubic splines in order to allow for a wide variety of relationships in regressions and demographic models. For the plant Borderea pyrenaica, we found that vital rate relationships with size and age were nonlinear and that the parameterization method had large effects on predicted growth rates (linear IPM: 0.95, nonlinear IPMs: 1.00, matrix model: 0.96). Our results suggest that restricted cubic spline models are more reliable than linear or polynomial models. Because even weak nonlinearity in relationships between vital rates and state variables can have large effects on model predictions, we suggest restricted cubic regression splines to be considered for parameterizing models of population dynamics whenever linearity cannot be assumed.

Key words: demography, integral projection models, linearity assumption, matrix models, natural splines, nonlinear regression, plant population dynamics, restricted cubic splines

Introduction

Individuals of different age or size often differ in survival probability and fecundity, and thus in their contribution to population growth. Due to such differences it is important to include a state or stage structure in models of population dynamics. In matrix models, populations are divided into discrete states and the transition rates between states over a time step are modeled using a transition matrix (Leslie 1945, Lefkovitch 1965, Caswell 2001). The Integral Projection Model (IPM; Easterling et al. 2000) is an extension of the matrix model, allowing state variables to be continuous. In both matrix models and IPMs, effects of state variables on vital rates can be incorporated as regression models (Morris and Doak 2002, Ellner and Rees 2006). Parameterization using continuous functions is more statistically efficient than the traditional method of parameterizing classes separately in matrix models, and allows more reliable estimates for sparsely sampled groups, such as old individuals. However, one potential problem with using regression analyses for model parameterization is that relationships between vital rates and state variables may not be adequately described by simple or polynomial generalized linear models.

Nonlinear relationships are sometimes accounted for in demographic models parameterized using regression analyses. For example, exponential relationships between state variables and vital rates are accounted for by log-transformations of state variables, or by the use of Poisson regression. In addition, probabilities of survival and flowering are typically modeled with logistic regressions, implicitly accounting for asymptotic relationships. Also incorporation of polynomial terms in regression models is a common practice to account for nonlinear relationships. However, using polynomial regressions to describe vital rates in demographic models may be risky since the shape of one tail of the resulting curve may be unduly influenced by observations at the other tail, in particular if the observations are not evenly spaced over the range of the predictor variable (Harrell 2001). Furthermore, neither linearizations nor the inclusion of polynomial terms may adequately describe nonlinear state variable – vital rate relationships if functional relationships differ between different regions of state variable values. Changes in functional relationships between state variables and vital rates may not be uncommon in organisms with marked transitions between life cycle stages. For example, the relationship between age and survival in birds has been shown to sometimes be “saw-toothed” (Low and Pärt 2009).

Incorporating nonlinear regression models into demographic models is straightforward in many situations, and this possibility was highlighted in the paper introducing the IPM (Easterling et al. 2000). However, more complex nonlinear vital rate – state variable relationships have rarely been included in demographic models. Here we illustrate a practical and reliable method of flexibly incorporating nonlinear relationships into IPMs using restricted cubic regression splines, and present results suggesting that model output may be very sensitive to assumptions of linearity and that allowing nonlinear relationships in demographic models should be considered whenever there is no a priori reason to assume either linear or other simple function forms.

We used four different methods of parameterizing structured population models of the long-lived plant Borderea pyrenaica, where vital rates may depend on both age and size. First, we used continuous classes parameterized using generalized linear models (GLMs) where linearity is assumed for continuous vital rate rates, logistic linear for proportions and log linear for counts (“linear IPMs”). Second, we used continuous classes parameterized using GLMs where the linearity assumption is relaxed by including also quadratic and cubic age and size effects on vital rates (“polynomial IPMs”). Third, we used continuous classes parameterized using GLMs where more complex effects of age and size on vital rates were allowed by using restricted cubic regression splines (“restricted cubic spline IPMs”). Fourth, we used a few discrete classes parameterized based on observed class transitions (“matrix models”), representing the most commonly used type of demographic model. We used models based on these four parameterization methods and asked (1) if relationships of vital rates with size and age were significantly nonlinear, (2) if incorporating statistically significant nonlinear relationships into demographic models influenced projections, (3) if modeling vital rates with restricted cubic splines yielded more plausible results than modeling vital rates using polynomial terms, (4) if predictions of the matrix model, where states are discrete but relationships with vital rates may be nonlinear, more closely resemble the nonlinear models with continuous state variables than the linear model, and finally (5) we simulated data based on linear and nonlinear relationships between state variables and vital rates to investigate consequences of modeling linear relationships as non-linear and vice versa.

Methods

The dataset

Borderea pyrenaica Miégeville (Dioscoreaceae) is a small dioecious herb occurring in the central Pyrenees. The plant has morphological features that enable easy and precise ageing by inspection of the underground storage organ (García & Antor 1995), making it possible to include age as a state variable in the population models, in addition to size. Individuals sometimes reach ages of above 300 years. The study system is described in detail in García and Antor (1995). To parameterize the structured population models, we used recordings on survival, size and fecundity over four years (1995–1998) on a total of 519 female individuals in permanent plots in one population (‘Pineta’). One more census in 1999, unearthing individuals, was used to determine if missing plants were dead or dormant (individuals that were missing in two consecutive censuses were assumed to be dead), and to record the age of plants by inspecting buried tubers. When calculating survival probabilities, male plants were also included since no differences between sexes have been found in previous analyses and very few observations of death were available despite the sample size (M. B. García, J. P. Dahlgren and J. Ehrlén, unpublished manuscript). Size was calculated as ln(number of leaves × shoot length2) since this has been found to more closely correlate with total plant biomass than other measures. Age estimates for individuals of unknown age but known size were imputed based on restricted cubic spline regressions of age on size. In order to avoid having zero mortality for the oldest individuals in the population models, an additional death of an individual of the median age of individuals in the oldest age class in the matrix model (145 years, see below) was added to the data. This addition did not change whether relationships between vital rates and state variables were significantly nonlinear or not.

Relaxing the linearity assumption in generalized linear models of vital rates

There exist several methods for relaxing the linearity assumption in GLMs. Apart from simply adding polynomial terms, such methods include GAM (Generalized Additive Models; Hastie and Tibshirani 1990), fractional polynomials (Royston and Altman 1994), and B-spline regression (e.g. Easterling et al. 2000). In this note we used restricted cubic spline regression (Stone and Koo 1985, Harrell 2001) to test nonlinearity and model vital rates due to the ease of incorporating complex nonlinearities in relationships between vital rates and one or several variables. Regression spline models are constituted by piecewise polynomial regressions that are fitted between join points (knots). Restricted cubic splines are cubic polynomial splines that are forced to have similar curvature at the join points, and with the tails of the curve restricted to be linear so as to make them more reliable for predictions than in other regression spline techniques. These restrictions mean that the number of parameters needed to describe the fitted curve equals the number of knots minus one. The exact location of knots has been found to be of little importance and 4 or 5-knot restricted cubic splines with evenly spaced knots satisfactorily model many kinds of nonlinear relationships, assuming a sufficient number of observations to avoid overfitting (Harrell 2001). More specifically, we used ‘5-knot restricted cubic splines’ fitted using the ‘Design’ package for R 2.9.1 (Harrell 2001, R development core team 2009). The ‘rcs’ function in the Design package was used to automatically divide predictor variables with evenly spaced knots, and models were fitted using the generic ‘glm’ function. The function named ‘Function’ in the Design package was used obtain an expanded function describing the fitted regression model in standard S (R) language – for incorporation into the IPM code.

Demographic models

One linear, one polynomial and one restricted cubic spline IPM, as well as a matrix model, were specified and population growth rates obtained from the four models were compared. For parameterization of the IPMs, generalized linear models were used to fit relationships between the state variables (age and size) and the vital rates survival, mean and variance in growth, flowering, and seed number. The data were pooled over years. Random individual effects were not significant and omitted in final models (M. B. García, J. P. Dahlgren and J. Ehrlén, unpublished manuscript). The integral projection models were specified as:

n(y, a + 1, t + 1) = ∫ K(y, x, a) n(x, a, t) dx,

where a is age, x is size year t and y is size year t + 1 and integration is performed over all possible sizes. Age was assumed to be continuous in the regressions determining relationships with vital rates, but we did not integrate over age in the population model since age of an individual is increased by one for each time step (modeled year). n is a density function describing the distribution of individuals among ages and sizes, and K is the projection kernel given by functions of survival (s), growth (g), flowering probability (fp), seed number (fn) and the size distribution of establishing seedlings (pd):

K(y, x, a) = s(x, a) g(y, x, a) + fp(x, a) fn(x, a) pd(y).

Growth was modeled as a normal distribution with the mean from a linear regression of y on x and a and the variance as the variance around this regression line. Survival was modeled as a logistic regression of presence or absence on x and a, flowering as a logistic regression on flowering or not flowering on x and a, and seed number as a Poisson regression on x and a. In the implementation of the IPM, the integration over size was performed using the midpoint rule and a 100 × 100 sized matrix (cf. Ellner and Rees 2006) for each of the 300 possible ages. Changes in survival probability were assumed to cease when individuals reached age 200, as further projections depended on few data. At age 300, individuals were entered into an absorbing “300+” class (few individuals reached this class, which is in accordance with the field data).

For each vital rate in the IPMs, a linear, a polynomial and a restricted cubic spline regression model on age and size were fitted separately. Non-significant effects (P > 0.05), based on likelihood ratio tests, were removed. Significance tests of nonlinear spline components were based on the null-hypothesis that all nonlinear spline components equaled zero (Harrell 2001).

In order to compare estimated population growth rates between the IPMs and a matrix model, we also specified a model with three size classes and five age classes. This model was parameterized based on observed transitions between classes (cf. Morris and Doak 2002). Size classes were distributed approximately evenly over the observed size ranges and were separated at sizes 6.5 and 8.5. Age classes were wider for more advanced ages and separated at ages 20, 40, 80 and 150.

Population growth rates (λ) were estimated by iterating the models until stable distributions were reached. Elasticity values showing the proportional consequences of small changes in either reproduction or in state transition probabilities for population growth rate were calculated based on the stable distributions and reproductive value distributions as in Ellner and Rees (2006). Confidence intervals of λ were calculated for each model from 1000 bootstrap replicates using the package ‘boot’ in R. Observations (individuals) in the same number as recorded were resampled with replacement from the data matrix and the models were re-parameterized and λ calculated for each replicate.

Simulations

To investigate the consequences of choosing the “wrong” parameterization method and to decide which regression modeling approach may be the most reliable, we performed simulations where true vital rate curves were known. We modeled three different scenarios corresponding to the true relationships being the linear, polynomial and spline models presented in Table 1, respectively. For each scenario, 1500 individuals of random age and size were drawn based on the observed distributions, taking the covariance into account by using Cholesky decomposition in the “mvtnorm” R package. In a second step, vital rates were assigned to individuals using the parameters from the “true” regression models. In accordance with the regression analyses, random error distributions were assumed to be binomial for survival and flowering, normal for growth, and Poisson for seed number. To the simulated data, we fitted linear, polynomial and restricted cubic spline models, thus resulting in nine combinations of true and modeled relationships. Quadratic and cubic age and size terms were included in all polynomial models, and 5 knots were used in the restricted cubic spline models. As we used a simulated dataset with many observations we included all terms regardless of their statistical significance, because full models have been found to produce more reliable predictions than reduced models in other studies (Harrell 2001). The vital rate functions fitted to the simulated data were used as components of IPMs and population growth rate was calculated as above. All steps were repeated 200 times for each of the three scenarios. In addition, we used 500 observations per vital rate to examine if relative differences between methods remained the same.