BUGS : Bayesian Statistics Using Gibbs Sampling

Introduction to WinBUGS

BUGS : ‘Bayesian statistics Using Gibbs Sampling’

- most widely used software written for MCMC (Markov chain Monte Carlo) techniques.

The development was continued with OpenBUGS at the university of Helsinki and MRC Biostatistics, U.K.

JAGS (Just another Gibbs sampler) employed for statistical work in many fields: ecology, management,genetics.

1. Preliminaries

1.1The free WINbugs software

You can get the software from the web site of the MRC Biostatistics Unit in Cambridge:

1.2Other free packages accompanying to BUGS output

There are two packages that can be used with Splus or R (also public domain): CODA and BOA. These can be used for diagnostics after the samplings. You can get them from the BUGS web site.

1.3Manuals

The WinBUGS manual available online. It gives details of how to do things in WinBUGS but you need to refer back to earlier manuals for more description of each example..
WinBUGS examples, volumes 1 and 2 with explanations, available online. These include the ones in the earlier sets of examples.

These manuals build on each other, so you really need to have the whole set to understand the programs. A good way to attack a problem you want to set up is to go to the volumes of examples and find one that is like your problem. You can then adapt their code to run on your problem.

The BUGS website also contains some slides of a tutorial introduction by David Spiegelhalter and many other useful links.

1.4 Starting WinBUGS

After downloading the WinBUGS, click on the WinBUGS icon or program file. Now explore the menus:

HELP - you see manuals and examples. When you click on examples, you can open them in something called a compound document that allows programs graphs and explanations to be there together. The file surgical: institution ranking would be a good one to look at first, as it is not too complex.

FILE - allows you to open existing files, or to start a new file to program your own example in the BUGS language.

DOODLE - is what you need to use to start a file for a new model specified in graphical format. (It is not always possible to use doodles to describe some kinds of model.)

There are two basic ways to specify a model for WinBUGS. One is by writing the necessary equations as a kind of program. The other is by using the mouse and various tools to draw a ‘doodle’.

2. Overview of model fitting

Steps:

– Specify model with parameters of interest

– Provide data

– Compile the model and data

– Initialize

– Specify quantities to be stored

– Run the sampler

– Draw inferences

2.1Specification and compiling the model

The first stage in model fitting is to specify your model in the BUGS language. Having typed in the model and data (and saved the document if necessary), we need to direct WinBUGS to read the relevant parts of the document and compile them ready for the MCMC run.

You then go to the model menu andyou will get a specification tool with buttons.

Click on the check model button. Any error messages will appear in the grey bar at the bottom of the screen. If all is correct, you will get the message ‘‘model is syntactically correct” in the grey bar at the bottom of the screen. You will also see that the compile and load data buttons will have their text darkened. You are now ready for the next steps using these buttons.

Data can be entered either in list format or in rectangular format (see examples below). Once you have typed in your data, highlight either the word ‘list’ or the whole set of data. You can use the load data button on the specification toolto read the data in to winBUGs.

You are now ready to compile your model using compile button on the specification tool. Again, check error messages. If all is correct you will get a “model compiled” message.

Then you need to give initial values. This is done with the specification tool menu either by setting them specifically from values you type in (set inits button) or by generating a random value from the prior (gen inits button). After loading or generating initial values, the status line reports ‘model initialized’.

WinBUGS is now ready to run the sampler. You can close the specification tool.

2.2Generating the samples – updating

You are now ready to generate samples and to examine the simulated output. To start the sampler, go to model and then update and you will get an updating tool.

You can select how many updates you get for each press of the update button and how often the screen is refreshed to show how sampling is proceeding. For some methods of updating WinBUGS will not permit you to store samples until after an ‘adapting’ phase. In these cases the ‘adapting’ button will be ticked. The over-relax button only applies to some methods of updating - see BUGS manual.

Updating does the sampling, but does not store any values. In MCMC methods you usually want to run the sampler for some time (e.g. 1000 iteration) to be sure it is stable before you start storing values.

Leave your updating tool open – you will be needing it again.

2.3Storing values and summarising results

After an initial run values go to the inference menu and samples and you will get a sample monitoring tool.

You start by entering the parameters you want to monitor in the node box, and for each one press set. If you also press trace you will see a plot of the samples as they are generated. Now go back to the updating tool and generate some samples.

Now go back to your sampling tool to look at the various ways of displaying your results:

history - shows you a plot of all the samples you have generated

density - gives a kernel density estimate of the posterior

stats - gives summary statistics including mean, s.d., median and percentiles that can be set with the panel on the right. These can be used for credible intervals. You will also get a MonteCarloerror for the mean that will indicate how well the mean of the posterior has been estimated from your number of samples.

AutoC - a plot of the autocorrelation in the chains

EXAMPLE 1. A simple proportion

Inference is required for the proportion (p) of unemployed people. A small sample data is available reporting 4 people unemployed out of a total of N= 14 people that were surveyed.

You can select a member of the beta family for the prior for p, but assume that you want to restrict

its range to something narrower than the complete range (0,1) by setting lower and upper bounds.

Write the code and check the model with a model specification tool Your code should look like this.

model;

{

x ~ dbin(p,N)

p ~ dbeta(1,1)I( 0.1, 0.6)

}

Now you will have to enter the data. The following list format will do (notice that it has to be a capital N as BUGS is case sensitive).

Data;

list(N=14,x=4)

BUGS now has data for a node with a distribution, so it will calculate the appropriate likelihood and prior for pi, and combine them into a posterior. It knows about conjugate priors, so the calculations will be quick and easy if you use a beta or uniform prior.

Carry on to generate samples in the same way as explained before and obtain traces and kernel density plots of your posteriors, along with summary statistics giving the mean, mode and percentiles of your posterior. The upper and lower percentiles can be used to calculate credible intervals.

EXAMPLE 2 : Two proportions

It is an easy step to go on from this to make inferences for the difference or even the ratio of two proportions. Suppose that we have another survey, perhaps some time later, and this time 12 different people were asked and 6 were unemployed; an increased unemployment rate from 28% (4/14) to 50% (6/12). But how precisely are we estimating these quantities? What is the evidence that the underlying rates for the two surveys is really different? We can see how much we know about the differences in the rates from these two small surveys by calculating a posterior for the difference or the ratio of these two rates.

model;

{

x1 ~ dbin(p1,N1)

p1 ~ dbeta(1,1)I( 0.1, 0.6)

p2 ~ dbeta(1,1)I( 0.1, 0.6)

diff <- p2 – p1

ratio <- p2 / p1

x2 ~ dbin(p2,N2)

}

Add the data and initial values and monitor samples for the ratio and the difference.

Data;

list(N1=14,x1=4, N2=12,x2=6)

We know that BUGS works well and easily for this type of simple examples. For more complicated examples, with lots of parameters we cannot be so sure. BUGS provides sets of diagnostics that allow you to check whether it is working properly.

Things can go wrong in using WinBUGS (the manual contains a healthy warning). Using the examples in the manual is a safe path to tread though (at least mostly) - but always be critical with your results to see if they make sense. Also, always use the diagnostics. You will learn about these later.

EXAMPLE 3: Normal mean and variance

The below is the sample of 5 values of the reduction in blood pressure produced by a drug

2 3 3 1 0

We want inference from the mean and standard deviation of the population they represent, without bringing in any prior information.

In BUGS we specify normal distributions in terms of their mean and precision. The precision (denoted psi below) here indicates 1/variance. BUGS requires all priors to be proper. For a prior the usual choice is a normal with a large standard deviation that will cover a very wide range.

For example, we can take N(0, 0.0000001) for mean (mu) but you can change this if it does not seem wide enough for your problem.

Priors for the standard deviation are a bit more difficult. The BUGS manual suggests two ways of doing this. Either you can set a flat prior for the variance, over a fixed range which gives a pareto prior for the precision, or you can use a member of the Gamma family for the precision, which is the conjugate prior for this problem.

model;

{

mean ~ dnorm( 0.0,1.0E-6)

psi ~ dgamma(0.001,0.001)

sd <- sqrt(1 / psi)

for( i in 1 : N ) {

x[i] ~ dnorm(mean,psi)

}

data;

list(N=5,x=c(2,3,3,1,0))

inits;

list(psi=1, mean=1)

inits;

list(psi=10,mean=-5)

Summary statistics from one run were

Node mean sd MC error 2.5% median 97.5% start sample

mean 1.789 0.8239 0.007935 0.1641 1.787 3.446 1001 9000

sd 1.625 0.8639 0.01149 0.7757 1.428 3.733 1001 9000

This can be compared with frequentist results (from the t and chi-squared distributions) for the same problem that give the following estimates and confidence intervals.

mean 1.8 - 95% confidence interval (0.18 to 3.41)

s.d. 1.30 - 95% confidence interval (0.78 to 3.74)

The column MC error tells you to what extent simulation error contributes to the uncertainty in the estimation of the mean. Simulation errors in the 95% posterior credible interval (columns 2.5% and 97.5%) will be considerably larger because of the small numbers in the tails of the distribution.

Examine the trace of all of your samples for the mean with the history button on the sampling tool. It should look very jagged if the samples are independent.

More BUGS codes…

EXAMPLE 4: Two normal samples

model

{

for (i in 1:N){ x[i]~ dnorm(mu1, psi1) }

for (j in 1:M){ y[i]~ dnorm(mu2, psi2) }

mu1~dnorm(22.0, 0.25)

mu2~dnorm(22.0, 0.25)

psi1~dgamma(0.1,0.1)

psi2~dgamma(0.1,0.1)

tau1 <- 1/psi1

tau2 <- 1/psi2

delta <- mu1-mu2

r<- tau1/tau2

}

data

list( x=c(22.0, 23.9, 20.9, 23.8, 25.0, 24.0,

21.7, 23.8, 22.8, 23.1),

y=c(23.2, 22.0, 22.2, 21.2, 21.6, 21.6,

21.9, 22.0, 22.9, 22.8),

N=10, M=9)

inits

list(mu1=20.0, mu2=20.0, psi1=1, psi2=1)

EXAMPLE 5: Multinomial and Dirichlet prior

model

{

y[1:K]~dmulti(theta[1:K],N)

theta[1:K]~ddirch(alpha[1:K])

N<- sum(y[])

}

data

list(K=6, alpha=c(1,1,1,1,1,1),

y=c(471,453,396,243,177,222))

inits

list(theta=c(0.2,0.2, 0.2,0.15, 0.15,0.1)