Evaluation of soil greenhouse gas emission modelsusing Bayesian inference and MCMC: II. Parameteridentifiability and equifinality
Gangsheng Wang andShulin Chen
Department of Biological Systems Engineering, Washington State University, Pullman, WA 99164-6120, USA
Corresponding Author: Shulin Chen, PhD, PE, Professor,
Bioprocessing and Bioproduct Engineering Laboratory
Department of Biological Systems Engineering
Washington State University
Pullman, WA 99164
509-335-3743 (phone)
509-335-2722 (fax)
ABSTRACT
Identifiability and equifinality are two interrelated concepts in mathematical modeling. The derivation of the Hessian matrix becomes crucial when the condition number is used as a diagnostic indicator for identifiability. A new method called covariance-inverse (CI) was advanced to derive the Hessian matrix via the inverse matrix of covariance.The covariance matrix is calculated directly from the posterior distributed parameter samples, which is generated by the Bayesian inference and the Markov Chain Monte Carlo (MCMC) technique. Compared with two existing methods, i.e., difference quotients (DQ) and quasi-analytical(QA), CI is fairly intuitive, effective, and reliable. CI was then used for identifiability diagnosis on the SoilGHG model. The model as a whole was poorly identified, but a reduced model with fewer parameters could become identifiable, which is called “conditional identifiable” in this paper. A poor conditioning of the Hessian matrix indicates that some parameters are strongly interdependent. The correlation matrix shows that 60% of absolute values of correlation coefficients (ACCs) were relatively small (0–0.2) and only 3% of the parameter pairs had a high ACC between 0.5–0.7. Three of the 19 model parameters had a relative high ratio (> 20%) of standard deviation tomean, which established the equifinality in the model. An interesting and instructive conclusion is that the model with no more than four undetermined parameters has a high probability to be identifiable, but the model having nine or more unknown parameters exhibits a high risk to be non-identifiable. While advancing a new method for the diagnosis of parameter identifiability, we addressed an in-depth understanding of both identifiability and equifinality in the mathematical modeling of soil GHG emissions.
Key words: Bayesian inference; equifinality; Markov Chain Monte Carlo (MCMC);parameter identifiability
1. Introduction
Model identifiability is the problem of determining whether the parameters of a given mathematical model can be uniquely (globally or locally) recovered from data (Serban and Freeman, 2001). A parameter having a value of x0 is globally identifiable if and only if no other x1 within the feasible parameter space gives the same value of the likelihood function. If the value of the likelihood at x0 is unique in a sub-space containing x0, this parameter is said to be locally identifiable (Iskrev, 2007; Sorooshian and Gupta, 1985). Theoretically, the number of identifiable parameters is determined by the rank of the Hessian matrix (Sun, 2004; Viallefont et al., 1998). If the Hessian matrix is positive definite, i.e., each element of the matrix is positive, then the model structure is locally identifiable. However, positive definite is often not true and other times impossible to determine due to computing errors (Sorooshian and Gupta, 1985). Generally, global identification analysis for non-linear models is not feasible, and the Hessian matrix approach is only applicable for local identification analysis (Iskrev, 2007). Many studies introduce the condition number to diagnose whether the Hessian matrix is ill conditioned or not (Dee and da Silva, 1999; Dee et al., 1999; Iskrev, 2007; Lebbe and Van Meir, 2000; Serban and Freeman, 2001; Sun, 2004; Viallefont et al., 1998).A problem with a low condition number is said to be well-conditioned, while a problem with a high condition number is said to be ill-conditioned (Iskrev, 2007; Viallefont et al., 1998). Xia (1989) stated that non-identifiability of a model could be caused by parameter interactions, the lack of data, or the limitations of parameter optimization methods.
Equifinality is a concept used by Beven and Freer (2001) to reject the concept of the existence of a unique optimal parameter set. In other words, there may coexist multiple choices of parameter sets or model structures which can partially produce acceptable simulations. A previous paper (Wang and Chen, 2010) has provided evidence that equifinality did exist in the SoilGHG model, although the equifinality could be reduced via uncertainty analysis. As has been stated, a model having a unique optimal parameter set in a sub-space is locally identifiable. Thus, equifinality and identifiability exert apparently opposite characteristics in model parameterization and assessment yet, they are two interdependent concepts. On one hand, the idea of attaining identifiable parameters is part of the normal working paradigm for modelers (Beven and Freer, 2001). On the other hand, a completely identified model is difficult to realize because of incorrect or defective descriptions of the real processes and characteristics in environmental science and engineering, which means confronting the phenomenon of equifinality is inevitable. The effort to strive for identifiable parameters is the recognition of equifinality at the same time. Equifinality cannot be absolutely eliminated, but can be attenuated in some sense.
A model might be conditional identifiable in the case of reduced equifinality. Herein, “conditional identifiable” primarily refers to a weakly- or non-identifiable model structurewhich might become identifiable under certain conditions if the number of identified parameters, as well as the parameter space, could be reduced as much as possible by means of uncertainty analysis.
The derivation of the Hessian matrix becomes crucial when the condition number is used as a diagnostic indicatorof identifiability. Three methods are usually used to estimate the Hessian matrix: the analytical method, the quasi-analytical method and the difference quotients. The analytical method is limited to the conditions that the objective functions are explicitly derivable in second order. However, the objective functions usually cannot be depicted as an explicit function due to the implicit structure of most models. Therefore, the difference quotients (DQ)method and the quasi-analytical (QA) method with numerical approximation become useful in this circumstance. The DQ method, just as its name implies, approximately computes the second order derivatives at the optimal points, i.e., optimal parameter set (Viallefont et al., 1998; Xu and Vandewiele, 1995). Accordingly, an optimal parameter set is required prior to the implementation of DQ. Equifinality in a model structure could easily result in thefailure of the DQ method. The QA method estimates the Hessian by means of least squares fitting, where a defined objective function is the response variable, and the identified parameters and the products of any two parameters are the independent variables(Xia, 1989; Xia, 2002; Yam, 1997). The implementation of QA is fairly complex because of the fitting of a large amount of data and many variables. In addition, the significance and goodness (R2) of the least squares fitting directly influences the reliability of the Hessian matrix.
The objective of this study is to evaluate the model identifiability and equifinality through the development of a new method called covariance-inverse (CI). The main characteristic of CIis to estimate the Hessian matrix by inversing the covariance matrix, which is calculated from theparameter samples using the multinormal proposal distribution as described in the antecedent paper (Wang and Chen, 2010). This means the CI method is also based on the Bayesian inference and the Markov Chain Monte Carlo (MCMC) technique. The SoilGHG model (Wang and Chen, 2010) was still the objective of the study. The results from the new method were also compared with those from DQ and QA. The Shuffled Complex Evolution-University of Arizona (SCE-UA) algorithm (Duan et al., 1992) was used to provide an optimal parameter set for DQ. It is noted that the optimized parameter set by SCE-UA presented in this paper was not unique, but had mean values with standard deviations. The model simulation results were also presented.
2. Methods
2.1. Diagnosis of model identifiability
The Hessian Matrix of the objective function is not the only parameter analysis tool available, butit is a useful diagnostic to immediately present the identifiability of parameters(Bostick et al., 2007; Kavetski et al., 2006a; Kavetski et al., 2006b; Viallefont et al., 1998). A common expression of the objective function in modeling is the sum of squared errors (SSE):
(1)
whereand are observation data and simulation data at time i; is the parameter set including k parameters..
The Hessian matrix (H) is the square matrix of second-order partial derivatives of the objective function (F):
(2)
The model identifiability can be measured by the matrix condition number (Iskrev, 2007). The condition number is usually defined as the ratio of the largest to thesmallest singular value of the matrix (or the absolute values of eigenvalues if the matrix is square)(Edelman, 2005; Iskrev, 2007; Sorooshian and Gupta, 1985).
(3)
The diagnosis of model identifiability is based on the magnitude of the condition number as shown in Table 1(Xia, 1989). Model parameters are (i) identifiable if the condition number (η) is less than or equal to 10; (ii) weakly identifiable if 10 < η ≤ 200; and (iii) non-identifiable if ηis greater than 200.
2.2. Relationships between Hessian, covariance, and correlation matrix
The covariance matrix of the parameters can be estimated from the Hessian matrix (Sorooshian and Gupta, 1985; Vandewiele et al., 1992; Xu and Vandewiele, 1995)
(4)
(5)
(6)
(7)
whereis the minimum; is the Hessian of;is the covariance matrix; is the correlation coefficient of and; is the model standard deviation; is the standard deviation of; n is the number of terms in Eq. (1); and k is the number of parameters.
2.3. Estimation of Hessian matrix
If in Eq. (1) can be explicitly expressed as a function of , and is derivable in second order, the Hessian matrix can be calculated directly by derivations in Eq. (2). As previously mentioned, this is the analytical method. However, it is impossible to analytically derive the second-order derivatives for a complex non-linear model. That is why the DQ and QA methods have been developed. In this section, the DQ and QA are introduced firstly, and then the CI method is proposed.
2.3.1. Difference quotients method
The Hessian can be approximated at mesh nodes by DQ(Viallefont et al., 1998). Such an approximation can be a forward, backward, orcentral difference (D'Acunto, 2004). The forward approximation can be expressed as follows
(8)
(9)
where and are small increments to and , respectively; is the SSE under conditions that only the ith parameter () varies , with the other (k− 1) parameters unchanged; is the SSE in the case where and are minimally incrementedin and , respectively, while the other (k− 2) parameters remain unaltered.
The forward difference depicted above is used in this study. The precondition in order to use DQ is to find a relative optimum achieving. For this purpose, a parameter optimization algorithm (SCE-UA) is described later, and is used to obtainthe. While employing the DQ method, we did not change the parameter values directly, but changed the logarithm of the parameters by 1%.
2.3.2. Quasi-analytical method
The quasi-analytical method (QA) is to derive the Hessian matrix using the least-square method. Any non-quadratic objective function can be approximated by a quadratic function at any reference point (Xia, 2002):
(10)
This equation can be simplified as follows using matrix representations:
(11)
where
(12)
Thus, the Hessian matrix
(13)
reaches its minimum under the condition of
(14)
The optimal estimation for can be obtained from Eqs. (13)and(14):
(15)
Compared with DQ, the QA method does not require one toknow in advance. On the contrary, can be estimated from the Hessian matrix, while the Hessian is calculated from many samples of and the corresponding.
2.3.3. Covariance-inverse method based on MCMC
Based on the relationship between the covariance matrix and the Hessian matrix as shown in Eq. (4), the covariance-inverse (CI) methodis proposed in this paper. The CI method derives the Hessian matrix via the inverse matrix of covariance, where the covariance matrix is calculated directly from the posterior distributed parameter samples generated by MCMC. The general procedure to obtain the Hessian matrix includes: (i) generate parameter samples by MCMC;(ii) calculate the covariance matrix (Ʃ) from the parameter samples; (iii) compute the inverse matrix of Ʃ to obtain Ʃ−1, and .
Although the exact H is unknown, the condition number can still be calculated because it is the ratio of two singular values. This can be simply proved as follows.
Assuming
(16)
where is an unknown constant. It can be deduced that
(17)
whereandare the ith singular valuesof H and Ʃ−1 in order, respectively. Thus
(18)
where and are the maximum and minimum of .
2.4. Conditional identifiability corresponding to the number of parameters
The model identifiability is also influenced by the number of parameters to be identified. The fewer undetermined parameters a model has, the higher identifiability the model has. In order to quantify the effect of the number of undetermined parameters on model identifiability, and to illuminate under what conditions a model could be identified, the following procedure is proposed:
(i)Sort the singular values in descending order to get , where k is the total number of parameters, and are the largest and smallest singular values, respectively.
(ii)Calculate the condition numbers corresponding to each given number of parameters to be identified:
(19)
where m is the number of identified parameters.
(iii) For each given m, compute the geometric mean of
(20)
where is the geometric mean condition number for midentified parameters.
2.5. Parameter optimizationmethod
An integrated objective function considering two objectives (CO2 and N2O fluxes) is used for identifying the parameters. The two objectives have the same weighing factor (i.e., 0.5). For each objective, the coefficient of determination (Devore, 2008; Kothamasu et al., 2004) is employed to measure the goodness of fit between the simulated and observed data:
(21)
whereR2 is the coefficient of determination; is the average of the observations.
The SCE-UA algorithm (Duan et al., 1992; Duan et al., 1994; Wang et al., 2009) is employed for parameter optimization. In addition, the posterior parameter space from the multivariate normal proposal distribution listed in Table 4 of Wang and Chen (2010) is used for the SCE-UA optimization. As previously mentioned in the Introduction, the objective of parameter optimization is not only to provide a relative optimal parameter set for DQ and model simulations, but also to determine the standard deviation of each parameter. Therefore, the SCE-UA is run for 10 times with 10 different random seeds: 2, 3, 5, 7, 11, 13, 17, 19, 23, and 29 (Duan et al., 1994). The arithmetic mean of these 10 optimized parameter sets is employed as the optimal parameter set for DQ and model simulations. The standard deviation for each parameter can be also calculated by the parameter values from 10 model optimizations.
3.Results
3.1. Parameter optimization
Although the parameter values from ten model optimizations by SCE-UA were not identical, they yielded a very similar objective function value R2 = 0.9786, with R2 = 0.9600 for CO2 and R2 = 0.9972 for N2O. The high R2value close to 1.0 implies that the model optimizations were very successful and that the model performance for N2O was slightly better than that for CO2. The means and standard deviations (STDs) of the 19 parameters are shown in Table 2. The 19 parameters ranged several orders of magnitude from 10−6 to 101. The coefficient of variation (CV, i.e., the ratio of the STD to the mean) was also calculated. The CVvalues of seven parameters (rR, KMB, RDNmax, KEM_NO, Ks, PE, and BE) were very small and less than 0.1%. The CV values of four parameters (kB, kN2O, KEM_NO2, and KEM_N2O) were larger than 6%, three of which were even larger than 20%. The three rate constants (kA, kH, and kB)quantifying the decomposition processes were at the same order of magnitude (10−4 h−1), and kAkBkH regarding the mean values. However, the four rate constants for the denitrification processes fell into two groups. The parameter values of group 1 (kNO3 and kNO2 at10−1 h−1) were two orders of magnitude higher than those of group 2 (kNO and kN2O at 10−3 h−1). The four Michaelis-Menten (M-M) constants to calculate relative enzyme activity for denitrification were within 10−2 – 10−1 g N m−2.
3.2. Model identifiability regarding all 19 parameters
Whole model identifiability was firstly evaluated with all 19 parameters involved. In order to make the singular values of different methods comparable, all singular values were normalized by dividing them by the minimum singular value. Therefore, the condition number would be equivalent to the maximum normalized singular value (NSV) according to Eq.(3). As shown in Fig. 1, the 19thNSVs were the minimum with values of 1.0.
As for the DQ method, the mean parameter set listed in Table 2 was used as . The results indicate that the model with 19 parameters was non-identifiable because the maximum NSV was as high as 350,412.
Because the number of parameters (k) was 19, the total number of independent variables for the least squares fitting reachedk(k+3)/2 = 209 while applying the QA method.The first 5% of the parameter samples were used as burn-in, and the other 95% of the total 40,000 parameter sets were employed for further analysis.Thus the number of data points for the least squares fitting was 38,000. The least squares fitting was significant and acceptable because p-value < 0.001, and , . The maximum NSVmeans that the model with 19 parameters was non-identifiable.
The maximum NSV was 241,409 for the CI method. Fig. 1 shows that the NSVs by CI were close to those by QA. This was further illustrated by the pair plots of NSVs of DQ against CI and QA against CI. As shown in Fig. 2, the plot of NSVs (QA vs. CI) fell much closer to the diagonal than that of NSVs (DQ vs. CI), which implies that the results from CI and QA were quite similar.
3.3.Conditional identifiability corresponding to the number of parameters
The method proposed in Section 2.4 was used to investigate the conditional identifiability corresponding to the number of identified parameters. The Hessian matrix and its singular values derived by CI were used. As shown in Fig. 3, the geometric mean condition number increased dramatically with the number of identified parameters. Two critical numbers (4 and 8) were derived in Fig. 3. When the number of identified parameters was 4 or less, the geometric mean condition number was less than 10, which implies that it could be identifiable. However, the model could not be identified if the number of parameters exceeded 8, because the condition number would be greater than 200 in this case. The model with 5–8 identified parameters would fall into the weakly-identifiable category.
3.4. Model Simulations
The results of model simulations using the mean optimal parameter set from Table 2 are presented in this section. As shown in Fig. 4, the simulated CO2 and N2O fluxes agreed very well with both the observations and results by the unit response curve method (Wang and Chen, 2009). The good performance of the model simulation was verified by the high R2 values reported in Section 3.1.