Details of the Bayesian Statistical Modeling

Electric Supplemental Material 2

Details of the Bayesian statistical modeling

The hierarchical Bayesian models to analyze plant quadrat characteristics are designed as generalized linear mixed models. Since most quadrat characteristics (Table 4, 5) were defined as the ratio between two measurements, we introduced a set of equations to express it as described in the following. Suppose that, for example, target index is Φmass, PFD absorbed per unit aboveground mass. By setting two variables, to (absorbed photon per ground area per day) and to , aboveground vegetative mass (Table S1), the relationship between these quantities are defined as,

, (9)

where the notation is defined as follows the Gaussian distribution of mean and inverse variance ; is the target quantity, in this example, Φmass; inverse variance represents the degree of measurement errors. The target quantity was defined as

, (10)

where is the inner product of vectors of explanatory variable and coefficient ; represents the random effects of the focal quadrat. This functional relationship is shown as “log” on the “link” column in Table S1 because this is referred as log link function in the generalized linear modeling. The random effect term of quadrat was assumed to follow the Gaussian distribution of mean zero and inverse variance , i.e., . The prior distributions of were chosen as , while those of and were both the gamma distribution of which scale and shape parameters were set to 10-2.

Although most cases meet the description above, there were a few exceptions. In case where (see Table S1), the target quantity is itself. In the modeling of “occurrence” (Table S1), the error structure is different from the model above because species occurrence in a quadrat, i.e., , does not follow the Gaussian distribution. Therefore, we suppose that follows the Bernoulli distribution,

, (11)

where is the probability that species i exists in the focal quadrat. The probability was defined as,

, (12)

where the notation, logit, denotes the logit link function which is equivalent to the relationship expressed by a logistic function, .

The set of explanatory variables, , consists of four factorial variables and their interaction terms listed in Tables 4 and 5. The main-effect variables in are all factorial: altitude (three levels; low, mid, and high as defined above), leaf type (two levels; evergreen and deciduous species), plant type (two levels; herbs and shrubs), and leaf angle (three levels: X, 0-30°; Y, 30-60°; and Z, 60-90°).

The data set to be used in the statistical modeling was categorized into two classes for different purposes, namely, “Non-zero” and “All” data classes as shown on the “species” column in Table S1. In “Non-zero” class, an analysis was based on a subset of whole data that included only species of which aboveground biomass is larger than zero, i.e., . Contrastingly, two statistical analyses in “All” classes, occurrence and aboveground vegetative biomass (Table S1), were based on the data including all 38 species for all quadrat. In this case, one out of four explanatory variables, species-specific leaf angle, were not available because this is also quadrat specific. In order to generate the posterior distribution of leaf angle for each absent species in a quadrat, we prepared an empirical posterior distribution of which probabilities of leaf angle classes (X, Y, and Z) were set to the fraction of observed angles in all quadrats.

The marginal posterior distributions of all parameters were estimated using the Markov chain Monte Carlo (MCMC) method. This is much more efficient than the maximum likelihood estimation when statistical model is complex (Clark and Gelfand 2006). The posterior samples were obtained by three independent chains in which 400 values were sampled with a 20 step interval after 2000 burn-in MCMC steps. The convergence of MCMC calculations was confirmed by evaluating Gelman and Rubin’s (Gelman et al. 2004) for all parameters.

References

Clark JS, Gelfand AE (2006) Hierarchical modelling for the environmental sciences. Oxford University Press, Oxford

Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian Data Analysis (2nd edn). Chapman and Hall/CRC, London

Table S 1. Summary of hierarchical Bayesian models.

Target / yi(O) / xi(O) / link / species / results
occurrence of species / yi(O) ∈{0, 1} / 1 / logit / All / Table 4
aboveground vegetative mass / Mi / 1 / log / All / Table 4
Φmass / Φi, day / Mi / log / Non-zero / Table 5
Φarea / Φi, day / fi / log / Non-zero / Table 5
LAR / fi / Mi / log / Non-zero / Table 5
LMR / ML,i / Mi / log / Non-zero / Table 5
SLA / fi / ML,i / log / Non-zero / Table 5
mean leaf height / Hi / 1 / log / Non-zero / Table 5

Target column: characteristics and indexes to be modeled. yi(O) : left-hand side variable. xi(O): right-hand side variable, log xi(O) is the offset term in the model. link: types of link function. species: based on data set including “All” species or “Non-zero” species of which biomass are larger than zero. results: tables showing posterior distributions. Short definitions for symbols — i: species index. fi: leaf area of species i. Hi: mean leaf height. Mi: aboveground vegetative mass. ML,i: leaf mass. Φi,day: total photon ﬂux per day.