A Tutorial on Gibbs Sampling Software for Ecological Modelers

A Tutorial on Gibbs Sampling Software for Ecological Modelers

N. T. Hobbs

NR 575, Systems Ecology, Spring 2010

Aim

Introducing MCMC Samplers

Introducing BUGS

Installing Gibbs Sampling Software

For Windows users: Installing WinBugs

For Mac users: Installing JAGS

Implementing the logistic example to estimate r, K, and τ

Examining the Code

Technical notes

The model statement

for loops

Precision as an argument to the normal density function (and the lognormal)

The <- operator

Coding and saving the BUG’s script

Saving the program file: JAGS

Saving the program file: WinBUGS

The interface between BUGS and R

Interface between R and BUGS: JAGS

Interface between R and BUGS: WinBUGS

Getting simple output: JAGS and WinBUGS

Complete R script for JAGS (to this point)

Complete R script for WinBUGS

Exercise I: courtesy of McCarthy, M. A. 2007. Bayesian Methods for Ecology. Cambridge University Press, Cambridge, U. K.

Using the summary() list

Estimating Quantities of Interest Derived from Model Parameters

Exercise 2

Comparing Parameter Values Between Populations

Using the MCMC List

Exercise 3

Checking for Convergence

Checking Syntax

WinBUGS

JAGS

Some common error messages

JAGS

WinBUGS

made use of undefined node—

multiple definitions of node [x]

incompatible copy

Splus data 100

Cannot Bracket Slice for Nod

Shape parameter of gamma r too small.

Differences Between JAGS and WinBUGS

Bibliography

Appendix 1: Reference material for JAGS

Distributions

Functions and operators

Reference Material for WinBUGS

Distributions

Functions and Operators

Appendix 2: Answers to Exercises

Exercise 1

Exercise 2

Exercise 3

Aim

The purpose of this Primer is to teach the programming skills needed to estimate the posterior distributions of parameters and derived quantities of interest in ecological models using software implementing Mote Carlo Markov chain methods. Along the way, I will reinforce some of the ideas and principals that we have learned in lecture. My approach to teaching this topic is quite different than you are likely to find in other sources because I will integrate the programming you need to learn in R with the programming needed in the MCMC software.

The Primer is organized primarily as a tutorial, but it contains a modicum of reference material as well. Although I think this Primer offers a great start for learning JAGS and WinBUGS, it is not intended to be comprehensive. So, some Sunday afternoon when you find yourself with time on your hands, you should read through the WinBUGS and JAGS manual as well.

Introducing MCMC Samplers

WinBugs, OpenBUGS, and JAGS are three systems of software that implement Monte Carlo Markov Chain sampling using the BUGS language.[1] BUGS stands for Bayesian Analysis Using Gibbs Sampling, so you can get an idea what this language does from its name. Imagine that you took the MCMC code you wrote for a Gibbs sampler and tried to turn it into a general R function for building chains of parameter estimates. Actually, you know enough now to construct a very general tool that would do this. However, you are probably delighted to know that all of this work has been done for you in the BUGS language.

The BUGS language is currently implemented in three flavors of software: OpenBUGS, WinBUGS, and JAGS. OpenBUGS and WinBUGS run on Windows operating systems, while JAGS was specifically constructed to run on Mac OS and Unix.[2] Although all three use essentially the same syntax[3], OpenBUGS and WinBUGS run in an elaborate graphical user interface, while JAGS only runs from the command line of a Unix shell or from R. However, all three can be easily called from R, and this is the approach I will teach. In the past, I have taught students how to use the graphical version, but they quickly learned, as I have, that the GUI involves far to much tedious pointing and clicking and doesn't’ provide the flexibility that is needed for high level work. So, I have decided to abandon any detailed instruction on the GUI, but you will see it in print elsewhere. If you would like to look at past tutorials that include the GUI, I would be happy to provide them. That said, there is a bit we need to know about the interface, mainly as a syntax checker, which I will cover later. It is important for you to understand that when I talk about a BUGS program, it can be run in either JAGS or WinBUGS. When I am talking about distinct features of these programs and their implementation in R, I will be sure to point that out.

Introducing BUGS

This tutorial will use a simple example of regression as a starting point for teaching the BUGS language and associated R commands. Although the problem starts simply, it builds to include some fairly sophisticated analysis. The model that we will use is the a linear relationship between the per-capita rate of population growth and the density of a population, which, as you know is the starting point for deriving the logistic equation. For the ecosystem scientists among you, this problem is easily recast as the mass specific rate of accumulation of nitrogen in the soil; see for example, Knops and Tillman (2000)[4]. Happily, both the population example and the ecosystem example can use the symbol N to represent the state variable of interest. Consider the model,

which, of course gives the equation for a line with intercept r and slope = (Figure 1)

Figure 1. Simulated data for regression problem.

Note that these quantities enjoy a sturdy biological interpretation; r is the intrinsic rate of increase, is the strength of the feedback from population size to population growth rate, and K is the carrying capacity, that is, the population size (o.k., o.k., the gm N per gm soil) at which . Presume we have some data consisting of observations of per capita rate of growth of N paired with observations of N. The vector y contains values for the rate and the vector x contains aligned data on N[5], i.e. . A simple Bayesian model might reasonably specify the posterior distributions of r, K, and τ as

where the priors are uninformative. Now, I have full, abiding confidence that with a couple of hours worth of work, perhaps less, you could knock out a Gibbs sampler to estimate r, K, and τ. However, I am all for doing things nimbly in 15 minutes that might otherwise take a sweaty hour of hard labor, so, consider the code below.

##Logistic example for Primer

model{

#priors

K~dgamma(.001,.001)

r~dnorm(0.,.001)

tau~ dgamma(.001,.001)

sigma<-1/sqrt(tau)

#likelihood

for(iin1:n){

mu[i] <- r - r/K * x[i]

y[i] ~ dnorm(mu[i],tau)

}

} #end of model

This code above implements the Bayesian model, providing full MCMC chains for each parameter, chains that form the basis for estimating their posterior distributions and associated statistics, i.e., means, medians, standard deviations, and credible intervals. As we will soon learn, it becomes child’s play to derive chains other quantities of interest and their posterior distributions, for example, K/2 (What is K/2?), N as a function of time or dN/dt as a function of N. It is easy to construct comparisons between of the growth parameters of two populations or among ten of them. If this seems as if it might be useful to you, you should continue reading.

Installing Gibbs Sampling Software

For Windows users: Installing WinBugs

WinBUGS is free software maintained by the Medical Research Council Biostatistics Unit at the University of Cambridge. It can be downloaded from their web site,

For Mac users: Installing JAGS

For those of you using MAC OS, JAGS (for Just Another Gibbs Sampler) is available at You should click on JAGS-1.0.4u.dmg to get the disk image. There is a readme file with instructions. It will be very helpful to be able to run WinBUGS on your Mac as well, if for no other reason, to use its syntax checker for BUGS models. There are two ways to do that. You can, of course, create a Windows virtual machine using Parallels or VM Fusion and simply download WinBUGS onto it. There is also free software called Wine (for Windows Emulation) that allows Winbugs to run on the Mac without Parallels etc--go to idiom.ucsd.edu/~rlevy/winbugsonmacosx.pdf and follow the instructions. It may take a little work to get it set up. Let me know if you need help.

Implementing the logistic example to estimate r, K, and τ

Examining the Code

Study the relationship between the numerator of Bayes law and the code. The similarity should be pretty obvious, but there are a few things to point out. Priors and likelihoods are specified using the ~ notation that we have seen in class. In so doing, remember that the following notation is equivalent:

So, it is easy to see the correspondence between the fully conditional distribution (i.e., the numerator of Bayes law) and the code. I chose uninformative Gamma priors for K and τ because they must be positive. I chose an uninformative normal for r because it can be positive or negative.

Technical notes

The model statement

Your entire model must be enclosed in

model{

} #end of model

For JAGS users, it is critical that you put a hard return (a blank line) after the } #end of model statement. If you fail to do so, you will get the message #syntax error, unexpected NAME, expecting $end.

for loops

Notice that the for loop replaces the in the likelihood. Recall that when we specify a total likelihood, we ask, what is the probability that we would obtain this data point conditional on the value of the parameter(s) of interest? The total likelihood is the product of the individual likelihoods. Recall in the Excel example that we labored through that you had an entire column of likelihoods adjacent to a column of deterministic predictions of our model. If you were to duplicate these “columns” in WinBUGS, or JAGS you would write

mu[1] <- r - r/K * x[1]

y[1] ~ dnorm(mu[1],tau)

mu[2] <- r - r/K * x[2]

y[2] ~ dnorm(mu[3],tau)

mu[3] <- r - r/K * x[3]

y[3] ~ dnorm(mu[3],tau)

mu[n] <- r - r/K * x[n]

y[n] ~ dnorm(mu[n],tau)

Well, presuming that you have something better to do with your time that to write out statements like this for every observation in your data set, you may substitute

for(i in 1:n){

mu[i] <- r - r/K * x[i]

y[i] ~ dnorm(mu[i],tau)

}

for the line by line specification of the likelihood. Thus, the for loop specifies the elements in the product of the likelihoods.

Note however, that the for structure in the BUGS language is subtly different from what you have learned in R. For example the following would be legal in R but not in the BUGS language:

#WRONG!!!

for(i in 1:n){

mu <- r - r/K * x[i]

y[i] ~ dnorm(m,tau)

}

If you write something like this in BUGS you will get a message that complains about multiple definitions of node mu. If you think about what the for loop is doing, you can see the reason for this complaint; the incorrect syntax translates to

mu <- r - r/K * x[1]

y[1] ~ dnorm(mu,tau)

mu <- r - r/K * x[2]

y[2] ~ dnorm(mu,tau)

mu <- r - r/K * x[3]

y[3] ~ dnorm(mu,tau)

mu <- r - r/K * x[n]

y[n] ~ dnorm(mu,tau),

which is nonsense if you are specifying a likelihood. One more thing about the for construct. If you have two product symbols in the fully conditional distribution with different indexes, i.e. something like then you would implement this in BUGS as nested for loops, e.g.,

for(kin1:K){

expression[k]

for(jin1:J){

expression[j]

} #end of j loop

} #end of k loop

Precision as an argument to the normal density function (and the lognormal)

Note that in the code, the second argument to the Normal density function is tau, which is the precision, defined as the reciprocal of the variance, . This means that we must calculate sigma from tau if we want a posterior distribution on sigma. Be very careful about this. For the lognormal, it is the precision on the log scale.

The <- operator

Note that, unlike R, you do not have the option in BUGS to use the = sign in an assignment statement. You must use <-.

Coding and saving the BUG’s script

Carefully write out all of the code in the Logistic example (above) into program window in R.

Saving the program file: JAGS

For JAGS users, you may save this code in any directory that you like and may name it anything you like. I use names like logisticBUGS.R which lets me know that the file contains BUGS code. Using an R extension allows me to search these files easily with Spotlight.

Alternatively, you can embed the BUGS code in you R script using:

sink("logisticBUG.R") #This is the file name for the bugs code

cat(" model{

K~dgamma(.001,.001)

r~dnorm(0.,.001)

tau~ dgamma(.001,.001)

sigma<-1/sqrt(tau)

#likelihood

for(i in 1:n){

mu[i] <- r - r/K * x[i]

y[i] ~ dnorm(mu[i],tau)

}

} #end of model

",fill=TRUE)

sink()

Saving the program file: WinBUGS

For Windows users, you must put the code in a directory immediately off of the root directory with a short name (something like C:\BUGS) and name the file with < 8 characters, i.e., something like logis.bugs or logis.R. If you get a screen full of gibberish and the rather unhelpful message “INCOMPATIBLE COPY” at the top, the pathname to your file has too many characters. I know, this is cumbersome.

The interface between BUGS and R

Consider the following code:

rm(list=ls())

pop.data=(read.csv("Logistic Data"))

names(pop.data)=c("Year","Population Size", "Growth Rate")

inits=list(

list(K=1500, r=.5, tau=.05)

)

n.xy = nrow(pop.data)

data=list(n=n.xy, x=as.real(pop.data$"Population Size"), y=as.real(pop.data$"Growth Rate"))

The first two lines bring the growth rate data into R as a data frame. Next we specify the initial conditions for the MCMC chain. This is exactly the same thing as you did when you wrote you MCMC code and assigned a guess to the first element in the chain. There are two important things to notice about this statement. First, this must be composed as as “list of lists”, as you see above. If you use

#WRONG

inits=list(K=1500, r=.5, tau=.05)

your code will not run. Second, this statement allows you to set up multiple chains, which are needed to assure convergence[6]. For example, if you want three chains, you would use something like:

inits=list(

list(K=1500, r=.5, tau=.05), #chain 1

list(K=1000, r=.1, tau=.03), #chain 2

list(K=900, r=.7, tau=.07) #chain 3

) #end of inits list

Which variables in your BUGS code require initialization? Anything you are estimating must be initialized.[7] So, this is everything that has a prior, but also, any variable that is a random effect, which is a bit of an advanced topic now, but I would like for you to tuck that away for later. Think about it this way. When you were writing your own Gibbs sampler, every chain required a value as the first element in the vector holding the chain. That is what you are doing when you specify initial conditions here.

The next couple of statements,

n.xy = nrow(pop.data)

data=list(n=n.xy, x=as.real(pop.data$"Population Size"), y=as.real(pop.data$"Growth Rate"))

specify the data that will be used by your BUGS program. Notice that you can assign data vectors on the R side to different names on the BUGS side. For example, the bit that reads

x=as.real(pop.data$"Population Size")

says that the x vector in your BUGS program (see the code above) is composed of the column in your data frame called Population Size and the bit that reads

y=as.real(pop.data$"Growth Rate")

the y vector required by the BUGS program is composed of a column in your data frame called Growth Rate (pretty cool, I think). It is important for you to understand that the lhs of the = in corresponds to the what the data are called in the BUGS program and the rhs of the = is what they are called in R. Also, note that because pop.data is a data frame I used as.real( ) to be sure that BUGS received real numbers instead of characters or factors, as can happen with data frames. This can be particularly important if you have missing data in the data matrix. The n is required in the BUGS program to index the for loop (see the code) and it must be read as data in this statement. By the way, you don’t need to call this list “data”---it could be anything (“apples”, “bookshelves”, “xy” etc.)

Interface between R and BUGS: JAGS

This section is for JAGS users. WinBUGS users should go to the next section.

Now that you have a list of data and initial values (which I include below for clarity) for the MCMC chain you make calls to JAGS using the following:

setwd("/Users/Tom/Documents/Ecological Modeling Course/WinBUGS manual/Primer of MCMC Sampling Software for Ecologists/")

rm(list=ls())

pop.data=(read.csv("Logistic Data"))

names(pop.data)=c("Year","Population Size", "Growth Rate")

inits=list(

list(K=1500, r=.5, tau=.05)

)

n.xy = nrow(pop.data)

data=list(n=n.xy, x=as.real(pop.data$"Population Size"), y=as.real(pop.data$"Growth Rate"))

library(rjags)

##call to JAGS

jm=jags.model("Logistic example BUGS.R",data=data,inits,n.chains=length(inits), n.adapt = 10000)

zm=coda.samples(jm,variable.names=c("K", "r", "sigma"), n.iter=50000, n.thin=1)

The first statement (jm = ....) creates and initializes then MCMC chain. Its first argument is the name of the file containing the BUGS code. Note that in this case, the file resided in the default working directory, which I specified at the top of the file. Otherwise, you would need to specify the full path name. The next two expressions specify where the data come from, where to get the initial values, and how many chains to create (i.e., the length of the list inits). Finally, it specifies the “burn-in” how many samples to throw away before beginning to save values in the chain. Thus, in this case, we will throw away the first 10,000 values.