Bayesian Inference and computational methods – Practical 2
Solutions including some R code
Question 1. The solution presented can be used to simulate from an arbitrary distribution on the set {1, 2, 3, 4} simply by changing the probabilities for the target distribution (vector p below). Moreover, it can work with different proposal distributions by changing the matrix q.
#Specify the target probability function
p = c(0.25, 0.25, 0.25, 0.25)
#Specify the matrix q defining the proposal distributions.
#Moves to be proposed to adjacent states only
q = matrix(c(0, 1, 0, 0, 0.5, 0, 0.5, 0, 0, 0.5, 0, 0.5, 0, 0, 1, 0), nrow = 4, ncol = 4, byrow = T)
#specify number of iterations
nits = 10000
x = c(1:10000) #vector to record sequence of states
x[1] = 1 #you've got to start somewhere!
# Now iterate the chain
for( i in 1:(nits-1)) {
#Propose new value
u = runif(1, 0, 1)
k = 1
sum = q[x[i], 1]
while(sum < u) {
k = k+1
sum = sum + q[x[i], k]
}
#calculate the acceptance ratio
pacc = min(1, p[k]*q[k, x[i]]/(p[x[i]]*q[x[i], k]))
if( runif(1, 0, 1) < pacc) x[i+1] = k
else x[i+1] = x[i]
}
Question 2. Simulating from Gamma(2, 1) using Metropolis algorithm
The following R code does the job for you:
x = c(1:1000)
del = 1.0
alf = 2
bet = 1
#initialise x to be distribution mean on grounds that it should be a plausible
#draw from the distribution
x[1] = 2
for( i in 1:(nits-1)) {
#propose new value of xp
xp = runif(1, x[i] - del, x[i] + del)
if(xp < 0) x[i+1] = x[i]#reject proposed value if negative
else{
#calculate acceptance ratio - proposal density cancels since symmetric
pacc = (xp/x[i])^(alf-1)*exp(bet*(x[i] - xp))
if(runif(1, 0, 1) < pacc) x[i+1] = xp
else x[i+1] = x[i]
}
}
Note that the variance of the proposal distribution can be altered by changing the value of del, above. If you examine a plot of x[i] against iteration [i] (using command plot(x, type = "l"), for example) you can see the effect on the dynamics of the chain.
i) del = 0.05
Markov chain makes lots of small jumps but doesn't explore much of the range of the Gamma(2, 1) within the 1000 iterations plotted.
ii) del = 1. More sensible choice. Chain explores mixes better, and explores the range more efficiently.
iii) del = 20. Far too large - most of proposed values are negative or else in the right-hand tail of the distribution where the density is very small. They are nearly always rejected, but when accepted the chain makes sizable jumps. This gives the dynamics seen below.
Note the stationary distribution is the same regardless of the choice of del. As the number of iterations increases, a histogram of the values will always take on the shape of the Gamma(2, 1) density. However, for del = 0.5 and del = 20, a lot of iterations would be required.
Effect of intial value. Suppose you start with x[1] = 20 - rather extreme for a Gamma(2, 1) r.v.! Then for del = 1, we obtain:
From this we can see that a burn-in period of around 200 iterations is required before the Markov chain can be considered to be exploring the stationary distribution. When using output from a Markov chain, 'trace' plots should be examined in order to identify a suitable 'burn-in' period.
Question 3. Inference for (a, b) in the Gamma(a, b) density given random sample x. This algorithm is described in detail in the lecture notes for Chapter 5 of the course.
a) Method of moments estimation (you should know how to do this!) would give estimates for a and b around 31.0 and 6.5 respectively. However, this value of a has zero weight under prior, so we could initiate a with a high value in the range (1, 15) and choose b so that the mean of the distribution is around 4.5. For example set a[1] = 14 and b[1] = 3.
Simple Metropolis Algorithm for Inference for a and b in gamma
sx = 38.76
px = 266274.0
dela = 3#chosen after trial and error
delb = 2#after trial and error
nits = 10000
a = c(1:nits)
b = c(1:nits)
a[1] = 14#see comments above
b[1] = 3
theta = 0.1
nsamp = 8
for(i in 1:(nits-1)) {
ap = a[i] + 2*dela*(runif(1, 0, 1)-0.5) #propose new value of a
#only calculate acceptance probability if a lies in (1, 15)
if(ap > 1 & ap < 15){
pacc = (px*b[i]^nsamp)^(ap-a[i])*(gamma(a[i])/gamma(ap))^nsamp
if(runif(1, 0, 1)<pacc) a[i+1] = ap
else a[i+1] = a[i]
}
else a[i+1] = a[i] #reject if proposed value outside (1, 15)
bp = b[i] + 2*delb*(runif(1, 0, 1)-0.5) #propose new b
if(bp < 0) b[i+1] = b[i]#reject if negative value proposed
else{
pacc = (bp/b[i])^(nsamp*a[i+1])*exp(-1.0*(sx + theta)*(bp - b[i]))
if(runif(1, 0, 1) < pacc) b[i+1] = bp
else b[i+1] = b[i]
}
}
Results for this case look like the following:
Posterior mean of a is around 12.1 and posterior variance 4.7. The distribution is skewed towards larger values of a and the mode of the histogram is 14-15.
For b the posterior mean and variance are around 2.5 and 0.26.
Trace plots of a and b look like:
These suggest qualitatively that the chain is exploring the posterior distribution of (a, b) reasonably efficiently and that our rough estimates of means and variances are plausible.
If we sort a and b (in R, asort = sort(a), bsort = sort(b)) we can estimate 90% credible interval as (asort[500], asort[9500]). This gives [7.8, 14.8] for a, and [1.6, 3.3] as approximate 90% intervals for a and b respectively.
The strong association can be seen by plotting the points (a[i], b[i]) on a scatter diagram (in R, plot(b, a))
If you change the prior to put more weight on smaller values of b, then the posterior should 'shift' towards lower values. For Exp(1) the results are:
Compared to the Exp(0.1) prior, these posteriors favour lower values of a and b. This is expected since the Exp(1) prior places less prior probability on higher values of b than the Exp(0.1). The effect is more pronounced for an Exp(5) prior for b. Histograms from a run of 10000 samples look like:
Now the posterior mode for a seems to lie in the interior of the interval .
Finally if we increase the amount of information we have by taking a samples of size 20 and 40 we should expect to obtain far better information on a and b from the posterior densities. Histograms below are all obtained using the Exp(0.1) prior for b.
R code:
nsamp = 20
x = rgamma(nsamp, 8, 1.5)
sx = sum(x)
px = prod(x)
The run M-H algorithm.....
From this we see that the posteriors for a and b are much more concentrated around the actual values used in the simulation. NB You shouldn't expect that the true value is located at the mode of the distribution, but merely that it doesn't lie 'away out in the tails' of the distribution.
Results are even better for a sample of size 40.