STAT 875 Homework for Section 6.5 with Partial Answers

STAT 875 homework for Section 6.5 with partial answers

1)Problem #1 on p. 468 – Parts of this problem have been done in the class notes. Below is some additional work for this problem.

For part c:

> mod.glmm.1s.i <- glmer(formula = good ~ distance + (1|kicker) +

(0+distance|kicker), nAGQ = 1, data = kick, family = binomial(link = "logit"))

summary(mod.glmm.1s.i)

Generalized linear mixed model fit by maximum likelihood ['glmerMod']

Family: binomial ( logit )

Formula: good ~ distance + (1 | kicker) + (0 + distance | kicker)

Data: kick

AIC BIC logLik deviance

783.0929 804.1406 -387.5465 775.0929

Random effects:

Groups Name Variance Std.Dev.

kicker (Intercept) 0.07566 0.2751

kicker.1 distance 0.00000 0.0000

Number of obs: 1425, groups: kicker, 34

Fixed effects:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 5.87481 0.33494 17.54 <2e-16 ***

distance -0.11680 0.00846 -13.81 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:

(Intr)

distance -0.948

The estimated model is where b0i is a random draw from a normal distribution with mean 0 and variance 0.076 and the b1i are all 0 (they are random draws from a normal distribution with mean 0 and variance 0).

The LRT p-value using the chi-square distribution approximation is 0.5000. The bootstrap p-value calculation is shown below.

numb.sim <- 2000

> sim.H0 <- simulate(object = mod.glmm.1, nsim = numb.sim, seed = 8512)

start.time<-proc.time()

LRT.star <- numeric(length = numb.sim)

for (i in 1:numb.sim){

mod.glmm.1 <- glmer(formula = sim.H0[[i]] ~ distance + (1|kicker), nAGQ = 1,

data = kick, family = binomial(link = "logit"))

mod.glmm.1s.i <- glmer(formula = sim.H0[[i]] ~ distance + (1|kicker) +

(0+distance|kicker), nAGQ = 1, data = kick, family = binomial(link =

"logit"))

mod.fit.wo <- glm(formula = sim.H0[[i]] ~ distance, data = kick, family =

binomial(link="logit"))

#print(i)

LRT.star[i] <- -2*(logLik(mod.glmm.1) - logLik(mod.glmm.1s.i))

}

curve(expr = dchisq(x = x, df = 1), add = TRUE, col = "red")

abline(v = LRstat.slope, col = "blue")

summary(LRT.star)

Min. 1st Qu. Median Mean 3rd Qu. Max.

-0.000021 0.000000 0.000000 0.210000 0.122800 4.791000

mean(LRT.star >= LRstat.slope) # p-value

[1] 0.445

end.time<-proc.time()

save.time<-end.time-start.time

cat("\n Number of minutes running:", save.time[3]/60, "\n \n")

Number of minutes running: 57.08217

Notice the minimum -2log() value is a little less 0. I am not concerned about this because of the numerical approximations used in evaluating the likelihood function and the minimum value is still very close to 0. Also, it is interesting to note the highest bar in the histogram appears to be left of 0. This is just likely due to R choosing the histogram classes so that first class represents those values less than or equal to 0.

2)Consider the simple GLMM model where Yi is a binomial random variable for the ithitem in the sample (ni= 10 trials) and for i = 1, …, n. The responses Yi are independent for i = 1, …, n. The bi are independent as well. Complete the following:

a)Interpret what the model represents at the true probability of success and relate this to the overdispersion discussion of Section 5.3.

b)Simulate n = 1000 observations from this model where 0 = 1 and  = 2.

> n <- 1000 #Number of binomial observations

> beta0 <- 1

sigma <- 2

individual <- 1:n # Indicates observation number

numb.trials <- 10 #n_i

trials <- rep(x = numb.trials, times = n)

> # Simulate the response Y

set.seed(8182)

> b <- rnorm(n = n, mean = 0, sd = sigma)

pi.i <- plogis(beta0 + b)

> y <- rbinom(n = n, size = numb.trials, prob = pi.i)

> head(y)

[1] 10 0 3 1 10 10

c)Estimate the corresponding model to the simulated data. Compare the estimated values of 0 and  to their true values. Use 20 quadrature points.

> # Estimate model with 20-point quadrature

> mod.fit20 <- glmer(formula = y/trials ~ 1 + (1|individual), nAGQ = 20, weights =

trials, family = binomial(link = "logit"))

summary(mod.fit20)

Generalized linear mixed model fit by maximum likelihood ['glmerMod']

Family: binomial ( logit )

Formula: y/trials ~ 1 + (1 | individual)

AIC BIC logLik deviance

2795.171 2804.987 -1395.586 2791.171

Random effects:

Groups Name Variance Std.Dev.

individual (Intercept) 3.966 1.992

Number of obs: 1000, groups: individual, 1000

Fixed effects:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 0.98927 0.06966 14.2 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The estimated model is where bi ~ N(0,1.9922). As would be expected with a large sample size, these estimates are fairly close to their true values.

d)Examine what happens to the estimated model with more and less points of quadrature.

e)The purpose of this part is to examine how similar models can be estimated using pre-Section 6.5 methods.

i)Estimate using maximum likelihood like in Chapter 2. Is there evidence of overdispersion?

Yes! For example, see the deviance/df statistic.

ii)Estimate using a quasi-binomial model. Similar to the quasi-Poisson regression model, a parameter  allows for the variance to be nii(1 – i) rather than nii(1 – i) only.

iii)Compare the models in i) and ii) to the model estimated in part c).