Practical 18: Further Methods for binary multilevel models
This practical has 3 parts: In part 1 we consider the probit link function and the latent variable methods described in the lecture to fit such models using MCMC in MLwiN. Then in part 2 we look at the AR sampler in WinBUGS and again fit models to the Bangladesh contraceptive dataset. Finally in part 3 we look at fitting models to the pig-adg dataset using the probit link and using WinBUGS.
Part 1: Probit regression
Here we will consider the Bangladesh dataset ‘bang1.ws’ described in the last practical and you will find that this practical follows on from that practical.
Recall from lectures that the logit link function is only one possible link function that we can use with Binomial data. It is commonly used in medical applications as it has a log-odds interpretation. By this we mean that the exponential of any coefficient of a variable with a fixed effect (x) may be interpreted as an odds ratio, representing the multiplicative effect of a one unit increase in x on the odds of the outcome (if x is continuous) or the odds relative to those for the reference category (if x is a dummy variable). For example the odds of using contraceptive for a woman with one child (using our last model) are e1.157=3.18 times the odds of using contraception for women with no children assuming all other factors are constant. This is particularly used in medical applications where the response is mortality or infection, and so we can then work out the relative odds of death or infection for two different subsets of the data.
Another popular link function is the probit link, which is the inverse cumulative density function of the Normal distribution. This link is often used for economics applications although, as with the logit, it can be used in many application areas. One important advantage of the probit both in terms of MCMC algorithms and interpretation is that we can think of our response as a threshold from an underlying (unknown) continuous response. This interpretation of a binary response can be used with any link function, but when used with the probit link the unknown continuous response is then normally distributed.
To illustrate this threshold idea, consider an exam that is marked out of 100. (Note that marks out of 100 are not continuous data but are often treated as normally distributed and are a better approximation to continuity than a pass/fail response.) Then a pass mark may be set at 50 and our response will be whether an individual student passes (gets 50 or above) or fails (gets below 50). So our observed response is the pass/fail indicator but this is really a surrogate for the more informative (unknown) response, which is the actual mark. The hope is that predictors related to the mark out of 100 will also have a similar relationship to the pass/fail response. Of course in certain situations, for example mortality, it is difficult to think of a continuous predictor, which is underlying the 0/1 response (it is hard to rate people as being more dead than each other) but we can still use the threshold idea.
The threshold idea goes back a long way in the statistics literature but has been used recently in conjunction with the Gibbs Sampler by Albert and Chib (1993) using a data augmentation algorithm (Tanner and Wong 1987).
The idea then proceeds as follows: Let us assume we have a binary variable yicollected for several individuals i, that is a thresholded version of an (unknown) continuous normally distributed variable yi*. Now if we knew the value of yi* then we could fit the standard Gibbs sampling algorithm for normal response models. So we add an extra step into the Gibbs sampling algorithm and generate yi* at each iteration from its conditional posterior distribution which is a truncated Normal distribution with mean (in the single level case) Xβ and variance 1. The truncation point is zero and if yiis 0, yi* has to be negative and if yiis 1, yi* has to be positive.
Then with the augmented dataset of yi* generated, the other parameters can be updated as in the Normal models discussed in earlier chapters. It should be noted that this model can also be updated using Metropolis sampling as with the logistic regression model but the Gibbs sampling algorithm is faster and produces less correlated chains (as shown later). Albert and Chib (1993) also give an approximate Gibbs algorithm for the logistic regression model that uses a t distribution with 8 degrees of freedom as an approximation to the logistic distribution but this has not been implemented in MLwiN.
Running a probit regression in MLwiN
We will now fit the last model from practical 5.1 part 1 with random coefficients for ‘urban’ using a probit link rather than a logit link. You will therefore need to set up that model prior to following the instructions below.
To change link function we need to do the following:
By default MLwiN picks univariate Metropolis Hastings for all non-Normal response models. We will therefore need to change estimation method to Gibbs Sampling.
The window will then look as shown below. Note that MLwiN has realised that it is possible to use Gibbs sampling for this model.
If we now click on the Reset button then MLwiN will choose Gibbs sampling (alternatively simply click on the two Gibbs buttons).
If we now run the model by clicking on the Start button, you will notice that we get the message ‘Burning In..’. This is because we are running Gibbs sampling and there is therefore no need to run an adapting period. The following table gives point estimates and effective sample sizes for runs of 5,000 iterations using both Gibbs and Metropolis sampling for this model.
Parameter / Gibbs / ESS (Gibbs) / Metropolis / ESS(Metro.)β0 / -1.039 (0.099) / 683 / -1.048 (0.099) / 60
β1 / -0.016 (0.005) / 1888 / -0.016 (0.005) / 275
β2 / 0.683 (0.097) / 1770 / 0.692 (0.100) / 199
β3 / 0.823 (0.106) / 1734 / 0.830 (0.108) / 176
β4 / 0.819 (0.111) / 1584 / 0.829 (0.108) / 93
β5 / 0.501 (0.109) / 580 / 0.505 (0.109) / 96
u00 / 0.155 (0.050) / 769 / 0.159 (0.051) / 182
u05 / -0.160 (0.067) / 518 / -0.171 (0.066) / 141
u55 / 0.267 (0.118) / 414 / 0.281 (0.111) / 124
Time / 52s / 107s
From the table we can see that the Gibbs sampler is not only faster but also produces larger effective samples due to less correlation in the chains it produces. We have however roughly the same estimates for both methods.
If we look at the DIC diagnostic for the probit model we see the following:
->BDIC
Bayesian Deviance Information Criterion (DIC)
Dbar D(thetabar) pD DIC
2328.19 2271.86 56.33 2384.53 (probit)
2328.93 2272.31 56.62 2385.56 (logit)
Here there is very little to choose between the two link functions. It is also possible to use a third link function, the complementary log-log link with Bernoulli models but this will not be considered here. Note that the Gibbs sampler cannot be used with the probit link when the response is a proportion.
Part 2 Comparison with AR Sampling in WinBUGS
Binomial response models can be fitted by other MCMC samplers, for example WinBUGS will use the Adaptive Rejection (AR) sampling algorithm (Gilks and Wild 1992). It is often interesting to compare the performance of the various samplers, both in terms of their speed and the autocorrelation in the chains they produce. In the above probit regression example we saw that the data augmentation Gibbs sampler approach was better both in terms of run length and chain correlation than the Metropolis algorithm. This is not however the case when we compare the Metropolis algorithm with the AR algorithm for logistic regression models, as the Metropolis algorithm is usually quicker and so we have to balance greater speed against higher autocorrelation.
We will illustrate this on a simple random slopes logistic regression model. If you are continuing from the last part you will clear the current probit regression and set up a model with just an intercept, age as a fixed effect and random effects for the districts.
The model when set up will look as follows:
We will firstly run IGLS and set up the model for AR sampling in WinBUGS.
We will assume that by now you are familiar with the basic functionality of WinBUGS. We now need to start the WinBUGS program and load the file ‘bang.bug’ as a text file from the directory it has been saved. Note that again we will need to change the Files of type box to ‘All files (*.*)’ to see the file ‘bang.bug’. When the file is loaded the model definition will look as follows:
# WINBUGS 1.4 code generated from MLwiN program
#----MODEL Definition------
model
{
# Level 1 definition
for(i in 1:N) {
use[i] ~ dbin(p[i],denom[i])
logit(p[i]) <- beta[1] * cons[i]
+ beta[2] * age[i]
+ u2[district[i]] * cons[i]
}
# Higher level definitions
for (j in 1:n2) {
u2[j] ~ dnorm(0,tau.u2)
}
# Priors for fixed effects
for (k in 1:2) { beta[k] ~ dflat() }
# Priors for random terms
tau.u2 ~ dgamma(0.001000,0.001000)
sigma2.u2 <- 1/tau.u2
}
Here we see that our response variable ‘use’ is defined as Binomially distributed and is related to the predictor variables via the logit link function. We will need to repeat the set up procedure that we used in earlier practicals for normal response model
This will have set up our model and we can now pick our parameters to store.
We have now set up the burnin of 500 iterations and the parameters we wish to store to run the updates we need to use the update window
We now need to wait a few minutes for WinBUGS to run. On my PC the updates take 172 seconds. In order to measure the efficiency of the sampler we will import the chains back into MLwiN and look at the effective sample size (ESS) measure. To do this we need to do the following:
Having saved the two files for the fixed effects we can return to MLwiN and use the BUGS Options window (available from the Model menu) we used earlier to input the data.
This will store ‘beta[1]’ in column c300 and ‘beta[2]’ in column c301 and rename the columns accordingly. We can now look at the diagnostics for the two parameters via the Column diagnostics window:
The diagnostics will then appear as shown below. Here we see that the effective sample size for β1 is 1099 for this sample of size 5,000 due to the autocorrelation in the chain. We can repeat this procedure for β2 and for σ2u and find their effective sample sizes. The information for all these parameters will be summarized in the table at the end of this section. Note here that the subscript numbering starts from 1 whilst in MLwiN it starts from zero.
We can now run the same model using MCMC in MLwiN. The model should currently be set up and MCMC should already be selected and so all we need to do is start the estimation. MLwiN does not by default give a timing estimate.
Using a stopwatch for this example the estimation took 20 seconds on my PC. We can now get effective sample sizes and other information from the MCMC diagnostics window.
The diagnostics for β0 using MLwiN will then be displayed as shown below. Here we have higher autocorrelation and so we see that the effective sample size is only 149.
The results from the two methods can be seen in the following table. The worst mixing parameter is the intercept (β0) and to get an effective sample size of 5000 will take 16+(5000/1099)*156= 12 minutes 6 seconds using WinBUGS while MLwiN will take 5+(5000/185)*15 = 6 minutes 50 seconds. This means that even though the WinBUGS produces a less correlated chain, MLwiN is sufficiently quicker to give an effective sample of 5,000 in less time. Note that MLwiN here has a long burn-in due the adapting method involved in its estimation method.
Parameter / 2nd PQL / WinBUGS / ESS (BUGS) / MLwiN / ESS(MLwiN)
β0 / -0.540 (0.084) / -0.542 (0.084) / 1099 / -0.540 (0.089) / 185
β1 / 0.009 (0.005) / 0.009 (0.005) / 4929 / 0.009 (0.005) / 1209
σ2u / 0.249 (0.075) / 0.266 (0.084) / 1221 / 0.273 (0.092) / 433
Time / 172s(16+156) / 20s(5+15)
So we see that here it appears that Metropolis sampling in MLwiN is better for this random effects logistic regression model. Browne and Draper (2000) showed similar results for a couple of random effects logistic regression models. It is however not guaranteed that this will be true for all models. Multilevel Poisson models seem to give highly correlated chains using Metropolis and they may be models that it makes more sense to fit using MCMC sampling in WinBUGS. You can investigate this further in the next practical. It is also interesting to see that there is reasonable agreement between the MCMC approaches and 2nd order PQL estimation, although 2nd order PQL is still slightly underestimating the between district variation.
Part 3
In the last practical we looked at several models for the pig_adg dataset using pneumonia as a response.
Try out what you have learnt from parts 1 and 2 of this practical as follows:
a)Repeat part 2 of practical 5.1 using a probit rather than logit link function and confirm that you get the same best models
b)Fit the models in practical 5.1 using WinBUGS and compare performance in terms of time and ESS.
P18-1