Statistical appendix

We provide here more detailed technical description of the chosen model for the analysis of sickness absence. Given a two stage Logit-Negbin-Hurdle model and nonlinear latent functions and the conditional observation model for th person having days or periods of sick leave is


where r is the dispersion parameter of the negative binomial. Latent function models the first stage, that is, the zero process and the latent function models the second stage that is, the count process. Both latent functions have a form

where is a stage {1,2}, is a group id {0,1,2,3}, is a person id, is a follow-up year. All persons belong to the group 0 in the year 0, and then to the one of the intervention groups {1,2,3}. is a common constant term, is a group specific constant, is a person specific constant, is a common latent function, is a group specific function, and is a person specific function. The hierarchical construction can model the variation between the groups and between the persons. [1] present very well the benefits of using hierarchical/multilevel models.

Prior for is presented as a Gaussian process


where the covariance function is a sum of six covariance functions representing the six parts in the hierarchical model, and denotes the hyperparameters of the covariance functions. Using Gaussian processes we do not need to fix the parametric form of the latent functions , but only specify a prior on the function space. The covariance function specifies the prior assumptions about the smoothness and stationarity of the latent functions.

The non-linear functions in our hierarchical model are defined using a neural network covariance function [2],

where is covariate vector augmented with 1 and is a diagonal weight prior, where is variance for bias parameter and are the variances for weight parameters. These parameters together control the prior assumptions about magnitude, non-linearity and saturation.

The hyperparameters of the Gaussian process and dispersion parameter of the negative binomial are estimated using type II MAP. Due to non-Gaussian likelihoods, there is no closed form solution for the marginal posterior density. We used expectation propagation algorithm to integrate over the latent values . The expectation propagation makes Gaussian posterior approximation by iteratively matching the moments of marginal posterior densities.

To check that point estimate for the hyperparameters is good approximation for the full Bayes approach of integrating over the all unknowns, we used central composite design (CCD) approximative integration for comparison [3].

The model comparisons were made using leave-one-out cross-validation with log predictive densities [4]. Instead of making the full inference for each data point, the LOO-CV log-predictive densities were approximated using the approximation provided by the expectation propagation method [2, 5].

The Gaussian process models were implemented using a freely available Matlab toolbox GPstuff (http://www.lce.hut.fi/research/mm/gpstuff/) [6].

1 Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models: Cambridge University Press, 2006.

2 Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning: The MIT Press, 2006.

3 Rue H, Martino S., Chopin N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2009;71(2):319-392.

4 Vehtari A, Lampinen J. Bayesian model assessment and comparison using cross-validation predictive densities. Neural computation 2002;14(10):2439-68.

5 Opper M, Winther O. Gaussian processes for classification: mean-field algorithms. Neural computation 2000;12(11):2655-84.

6 Vanhatalo J, Riihimäki J, Hartikainen J, et al. Bayesian Modeling with Gaussian Processes using the MATLAB Toolbox GPstuff. submitted 2011.