STAT 875 homework for Section 6.5 with partial answers
1)Problem #1 on p. 468 – Parts of this problem have been done in the class notes. Below is some additional work for this problem.
For part c:
> mod.glmm.1s.i <- glmer(formula = good ~ distance + (1|kicker) +
(0+distance|kicker), nAGQ = 1, data = kick, family = binomial(link = "logit"))
summary(mod.glmm.1s.i)
Generalized linear mixed model fit by maximum likelihood ['glmerMod']
Family: binomial ( logit )
Formula: good ~ distance + (1 | kicker) + (0 + distance | kicker)
Data: kick
AIC BIC logLik deviance
783.0929 804.1406 -387.5465 775.0929
Random effects:
Groups Name Variance Std.Dev.
kicker (Intercept) 0.07566 0.2751
kicker.1 distance 0.00000 0.0000
Number of obs: 1425, groups: kicker, 34
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.87481 0.33494 17.54 <2e-16 ***
distance -0.11680 0.00846 -13.81 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
distance -0.948
The estimated model is where b0i is a random draw from a normal distribution with mean 0 and variance 0.076 and the b1i are all 0 (they are random draws from a normal distribution with mean 0 and variance 0).
The LRT p-value using the chi-square distribution approximation is 0.5000. The bootstrap p-value calculation is shown below.
numb.sim <- 2000
> sim.H0 <- simulate(object = mod.glmm.1, nsim = numb.sim, seed = 8512)
start.time<-proc.time()
LRT.star <- numeric(length = numb.sim)
for (i in 1:numb.sim){
mod.glmm.1 <- glmer(formula = sim.H0[[i]] ~ distance + (1|kicker), nAGQ = 1,
data = kick, family = binomial(link = "logit"))
mod.glmm.1s.i <- glmer(formula = sim.H0[[i]] ~ distance + (1|kicker) +
(0+distance|kicker), nAGQ = 1, data = kick, family = binomial(link =
"logit"))
mod.fit.wo <- glm(formula = sim.H0[[i]] ~ distance, data = kick, family =
binomial(link="logit"))
#print(i)
LRT.star[i] <- -2*(logLik(mod.glmm.1) - logLik(mod.glmm.1s.i))
}
curve(expr = dchisq(x = x, df = 1), add = TRUE, col = "red")
abline(v = LRstat.slope, col = "blue")
summary(LRT.star)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.000021 0.000000 0.000000 0.210000 0.122800 4.791000
mean(LRT.star >= LRstat.slope) # p-value
[1] 0.445
end.time<-proc.time()
save.time<-end.time-start.time
cat("\n Number of minutes running:", save.time[3]/60, "\n \n")
Number of minutes running: 57.08217
Notice the minimum -2log() value is a little less 0. I am not concerned about this because of the numerical approximations used in evaluating the likelihood function and the minimum value is still very close to 0. Also, it is interesting to note the highest bar in the histogram appears to be left of 0. This is just likely due to R choosing the histogram classes so that first class represents those values less than or equal to 0.
2)Consider the simple GLMM model where Yi is a binomial random variable for the ithitem in the sample (ni= 10 trials) and for i = 1, …, n. The responses Yi are independent for i = 1, …, n. The bi are independent as well. Complete the following:
a)Interpret what the model represents at the true probability of success and relate this to the overdispersion discussion of Section 5.3.
b)Simulate n = 1000 observations from this model where 0 = 1 and = 2.
> n <- 1000 #Number of binomial observations
> beta0 <- 1
sigma <- 2
individual <- 1:n # Indicates observation number
numb.trials <- 10 #n_i
trials <- rep(x = numb.trials, times = n)
> # Simulate the response Y
set.seed(8182)
> b <- rnorm(n = n, mean = 0, sd = sigma)
pi.i <- plogis(beta0 + b)
> y <- rbinom(n = n, size = numb.trials, prob = pi.i)
> head(y)
[1] 10 0 3 1 10 10
c)Estimate the corresponding model to the simulated data. Compare the estimated values of 0 and to their true values. Use 20 quadrature points.
> # Estimate model with 20-point quadrature
> mod.fit20 <- glmer(formula = y/trials ~ 1 + (1|individual), nAGQ = 20, weights =
trials, family = binomial(link = "logit"))
summary(mod.fit20)
Generalized linear mixed model fit by maximum likelihood ['glmerMod']
Family: binomial ( logit )
Formula: y/trials ~ 1 + (1 | individual)
AIC BIC logLik deviance
2795.171 2804.987 -1395.586 2791.171
Random effects:
Groups Name Variance Std.Dev.
individual (Intercept) 3.966 1.992
Number of obs: 1000, groups: individual, 1000
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.98927 0.06966 14.2 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The estimated model is where bi ~ N(0,1.9922). As would be expected with a large sample size, these estimates are fairly close to their true values.
d)Examine what happens to the estimated model with more and less points of quadrature.
e)The purpose of this part is to examine how similar models can be estimated using pre-Section 6.5 methods.
i)Estimate using maximum likelihood like in Chapter 2. Is there evidence of overdispersion?
Yes! For example, see the deviance/df statistic.
ii)Estimate using a quasi-binomial model. Similar to the quasi-Poisson regression model, a parameter allows for the variance to be nii(1 – i) rather than nii(1 – i) only.
iii)Compare the models in i) and ii) to the model estimated in part c).
1