Exercise 3.5Page 1 of 10

> data(dvisits)

> head(dvisits)

sex age agesq income levyplus freepoor freerepa illness actdays hscore

1 1 0.19 0.0361 0.55 1 0 0 1 4 1

2 1 0.19 0.0361 0.45 1 0 0 1 2 1

3 0 0.19 0.0361 0.90 0 0 0 3 0 0

4 0 0.19 0.0361 0.15 0 0 0 1 0 0

5 0 0.19 0.0361 0.45 0 0 0 2 5 1

6 1 0.19 0.0361 0.35 0 0 0 5 1 9

chcond1 chcond2 doctorco nondocco hospadmi hospdays medicine prescrib

1 0 0 1 0 0 0 1 1

2 0 0 1 0 0 0 2 1

3 0 0 1 0 1 4 2 1

4 0 0 1 0 0 0 0 0

5 1 0 1 0 0 0 3 1

6 1 0 1 0 0 0 1 1

nonpresc

1 0

2 1

3 1

4 0

5 2

6 0

A data frame with 5190 observations on the following 19 variables.

'sex' 1 if female, 0 if male

'age' Age in years divided by 100 (measured as mid-point of 10 age

groups from 15-19 years to 65-69 with 70 or more coded

treated as 72)

'agesq' age squared

'income' Annual income in Australian dollars divided by 1000

(measured as mid-point of coded ranges Nil, less than 200,

200-1000, 1001-, 2001-, 3001-, 4001-, 5001-, 6001-, 7001-,

8001-10000, 10001-12000, 12001-14000, with 14001- treated as

15000

'levyplus' 1 if covered by private health insurance fund for

private patient in public hospital (with doctor of choice), 0

otherwise

'freepoor' 1 if covered by government because low income, recent

immigrant, unemployed, 0 otherwise

'freerepa' 1 if covered free by government because of old-age or

disability pension, or because invalid veteran or family of

deceased veteran, 0 otherwise

'illness' Number of illnesses in past 2 weeks with 5 or more coded

as 5

'actdays' Number of days of reduced activity in past two weeks due

to illness or injury

'hscore' General health questionnaire score using Goldberg's

method. High score indicates bad health

'chcond1' 1 if chronic condition(s) but not limited in activity, 0

otherwise

'chcond2' 1 if chronic condition(s) and limited in activity, 0

otherwise

'doctorco' Number of consultations with a doctor or specialist in

the past 2 weeks

(a)Fit a model doctorco with sex…….concond2. Does the model fit the data?

The predictor “age” and “agesq” makes no difference in the model. So use age.

The model fits the data as the p-value of Goodness of fit test is approximate 1 and the R-sqr = 0.2226

> model1<- glm(doctorco ~ sex + age +income+ levyplus +freepoor+ freerepa+ illness +actdays +hscore +chcond1 +chcond2,family=poisson,dvisits)

> summary(model1)

Call:

glm(formula = doctorco ~ sex + age + income + levyplus + freepoor +

freerepa + illness + actdays + hscore + chcond1 + chcond2,

family = poisson, data = dvisits)

Deviance Residuals:

Min 1Q Median 3Q Max

-2.9502 -0.6858 -0.5747 -0.4852 5.7055

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -2.097821 0.101554 -20.657 < 2e-16 ***

sex 0.156490 0.056139 2.788 0.00531 **

age 0.279123 0.165981 1.682 0.09264 .

income -0.187416 0.085478 -2.193 0.02834 *

levyplus 0.126498 0.071552 1.768 0.07707 .

freepoor -0.438462 0.179799 -2.439 0.01474 *

freerepa 0.083640 0.092070 0.908 0.36365

illness 0.186156 0.018263 10.193 < 2e-16 ***

actdays 0.126690 0.005031 25.184 < 2e-16 ***

hscore 0.030683 0.010074 3.046 0.00232 **

chcond1 0.117300 0.066545 1.763 0.07795 .

chcond2 0.150717 0.082260 1.832 0.06692 .

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 5634.8 on 5189 degrees of freedom

Residual deviance: 4380.1 on 5178 degrees of freedom

AIC: 6735.7

Number of Fisher Scoring iterations: 6

> pchisq(4380.1,5178,lower=FALSE)

[1] 1

R sqr =(5634.8-4380.1) / 5637.8 = 0.2226

(b)plot fitted value vs residue. Why there are lines ??

why ?? 

(c)backward eliminate predictors at significance lever 5%

dropped freerepa and levyplus. Below is the model with all predictors significant at 5% level of significance. (the R-sqr = 0.222, still a good model)

Model3<- glm(doctorco ~ sex +age +income +freepoor+ illness +actdays +hscore +chcond1 +chcond2,family=poisson,dvisits)

------

> drop1(model1,test="F")

Single term deletions

Model:

doctorco ~ sex + age + income + levyplus + freepoor + freerepa +

illness + actdays + hscore + chcond1 + chcond2

Df Deviance AIC F value Pr(F)

<none> 4380.1 6735.7

sex 1 4388.0 6741.5 9.2762 0.002333 **

age 1 4383.0 6736.5 3.3425 0.067571 .

income 1 4385.0 6738.6 5.7571 0.016457 *

levyplus 1 4383.3 6736.9 3.7248 0.053663 .

freepoor 1 4386.8 6740.4 7.8683 0.005050 **

freerepa 1 4381.0 6734.5 0.9787 0.322558 //drop

illness 1 4481.9 6835.4 120.2604 < 2.2e-16 ***

actdays 1 4917.1 7270.7 634.7816 < 2.2e-16 ***

hscore 1 4389.1 6742.7 10.6574 0.001103 **

chcond1 1 4383.2 6736.8 3.6834 0.055011 .

chcond2 1 4383.5 6737.0 3.9464 0.047025 *

---

------

Model2<- glm(doctorco ~ sex +age +income+ levyplus +freepoor+ illness +actdays +hscore +chcond1 +chcond2,family=poisson,dvisits)

> Model2<- glm(doctorco ~ sex +age +income+ levyplus +freepoor+ illness +actdays +hscore +chcond1 +chcond2,family=poisson,dvisits)

Call:

glm(formula = doctorco ~ sex + age + income + levyplus + freepoor +

illness + actdays + hscore + chcond1 + chcond2, family = poisson,

data = dvisits)

Deviance Residuals:

Min 1Q Median 3Q Max

-3.0004 -0.6851 -0.5761 -0.4858 5.7284

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -2.089063 0.100811 -20.723 < 2e-16 ***

sex 0.162000 0.055824 2.902 0.00371 **

age 0.355131 0.143196 2.480 0.01314 *

income -0.199806 0.084328 -2.369 0.01782 *

levyplus 0.083689 0.053544 1.563 0.11805 //highest P-value

freepoor -0.469596 0.176360 -2.663 0.00775 **

illness 0.186101 0.018260 10.191 < 2e-16 ***

actdays 0.126611 0.005029 25.177 < 2e-16 ***

hscore 0.031116 0.010065 3.092 0.00199 **

chcond1 0.121100 0.066389 1.824 0.06814 .

chcond2 0.158894 0.081762 1.943 0.05197 .

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 5634.8 on 5189 degrees of freedom

Residual deviance: 4381.0 on 5179 degrees of freedom

AIC: 6734.5

Number of Fisher Scoring iterations: 6

> drop1(Model2,test="F")

Single term deletions

Model:

doctorco ~ sex + age + income + levyplus + freepoor + illness +

actdays + hscore + chcond1 + chcond2

Df Deviance AIC F value Pr(F)

<none> 4381.0 6734.5

sex 1 4389.5 6741.0 10.0573 0.0015263 **

age 1 4387.1 6738.7 7.3071 0.0068906 **

income 1 4386.7 6738.2 6.7387 0.0094609 **

levyplus 1 4383.4 6735.0 2.8777 0.0898717 . //drop (agree with summary)

freepoor 1 4389.1 6740.6 9.5747 0.0019833 **

illness 1 4482.7 6834.2 120.2237 < 2.2e-16 ***

actdays 1 4917.6 7269.2 634.3792 < 2.2e-16 ***

hscore 1 4390.2 6741.8 10.9716 0.0009316 ***

chcond1 1 4384.3 6735.9 3.9453 0.0470546 *

chcond2 1 4384.7 6736.3 4.4379 0.0351984 *

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Warning message:

F test assumes 'quasipoisson' family in: drop1.glm(Model2, test = "F")

------

> Model3<- glm(doctorco ~ sex +age +income +freepoor+ illness +actdays +hscore +chcond1 +chcond2,family=poisson,dvisits)

> summary(Model3)

Call:

glm(formula = doctorco ~ sex + age + income + freepoor + illness +

actdays + hscore + chcond1 + chcond2, family = poisson, data = dvisits)

Deviance Residuals:

Min 1Q Median 3Q Max

-2.9109 -0.6843 -0.5758 -0.4901 5.7654

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -2.069666 0.100158 -20.664 < 2e-16 ***

sex 0.169389 0.055669 3.043 0.00234 **

age 0.348222 0.143284 2.430 0.01509 *

income -0.168246 0.082052 -2.050 0.04032 *

freepoor -0.499105 0.175288 -2.847 0.00441 **

illness 0.185559 0.018238 10.175 < 2e-16 ***

actdays 0.126423 0.005021 25.180 < 2e-16 ***

hscore 0.030678 0.010045 3.054 0.00226 **

chcond1 0.124662 0.066386 1.878 0.06040 . //potential to drop

chcond2 0.161590 0.081691 1.978 0.04792 *

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 5634.8 on 5189 degrees of freedom

Residual deviance: 4383.4 on 5180 degrees of freedom

AIC: 6735

Number of Fisher Scoring iterations: 6

> drop1(Model3,test="F")

Single term deletions

Model:

doctorco ~ sex + age + income + freepoor + illness + actdays +

hscore + chcond1 + chcond2

Df Deviance AIC F value Pr(F)

<none> 4383.4 6735.0

sex 1 4392.8 6742.3 11.0560 0.0008901 ***

age 1 4389.3 6738.9 7.0147 0.0081091 **

income 1 4387.7 6737.2 5.0487 0.0246869 *

freepoor 1 4392.8 6742.3 11.0839 0.0008769 ***

illness 1 4484.7 6834.3 119.7484 < 2.2e-16 ***

actdays 1 4919.6 7269.2 633.6426 < 2.2e-16 ***

hscore 1 4392.5 6742.0 10.7064 0.0010746 **

chcond1 1 4386.9 6736.5 4.1799 0.0409569 * // <0.05, not drop

chcond2 1 4387.3 6736.9 4.5957 0.0320994 *

---

all predictors are significant at 5% level

(d)the person predicted to visit the doctor the most under above model is

Coefficients:
(Intercept) / -2.069666 / Given the other predictors are fixed:
sex / 0.169389 / female (compare to male)
age / 0.348222 / oldest
income / -0.168246 / lowest income
freepoor / -0.499105 / not (lower income, recent immigrant, unemployed)
illness / 0.185559 / has more illnesses in past 2 weeks
actdays / 0.126423 / Highest # of reduced days of activity in past 2 wks due to illness
hscore / 0.030678 / worst health condition
chcond1 / 0.124662 / chronic conditions and limited in activity
chcond2 / 0.16159 / Highest # of consultations with doctor in past two wks

(e)predicted value for the last record

The predicted # of visit for the last record is 0.15552

> tail(dvisits)

sex age agesq income levyplus freepoor freerepa illness actdays

5190 0 0.72 0.5184 0.25 0 0 1 0 0

hscore chcond1 chcond2 doctorco nondocco hospadmi hospdays

5190 0 0 0 0 0 0 0

Coefficients:
(Intercept) / -2.069666 / 1 / -2.0697
sex / 0.169389 / 0 / 0
age / 0.348222 / 0.72 / 0.25072
income / -0.168246 / 0.25 / -0.04206
freepoor / -0.499105 / 0 / 0
illness / 0.185559 / 0 / 0
actdays / 0.126423 / 0 / 0
hscore / 0.030678 / 0 / 0
chcond1 / 0.124662 / 0 / 0
chcond2 / 0.16159 / 0 / 0
sum / -1.86101
exp(sum) / 0.15552

(f)compare the fit of Guassian model to poisson model

Guassian :R-square 0.2015

Poisson :R-square 0.2220

When we looked at the residual plot vs fitted value, both have lines (Guassian-straight lines, Poisson – curved lines) and patterns are similar.

However the residuals are different

Guassian : # 48, 599, 663

Poisson : # 48, 334, 711

> guassian<- glm(doctorco ~ sex +age +income +freepoor+ illness +actdays +hscore +chcond1 +chcond2,family=gaussian,dvisits)

> summary(guassian)

Call:

glm(formula = doctorco ~ sex + age + income + freepoor + illness +

actdays + hscore + chcond1 + chcond2, family = gaussian,

data = dvisits)

Deviance Residuals:

Min 1Q Median 3Q Max

-2.13659 -0.25838 -0.14573 -0.04692 7.01165

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.042005 0.035497 1.183 0.23672

sex 0.038827 0.021312 1.822 0.06855 .

age 0.177070 0.055853 3.170 0.00153 **

income -0.052225 0.029473 -1.772 0.07646 .

freepoor -0.119927 0.051012 -2.351 0.01876 *

illness 0.059977 0.008331 7.199 6.93e-13 ***

actdays 0.103102 0.003656 28.203 < 2e-16 ***

hscore 0.016972 0.005179 3.277 0.00106 **

chcond1 0.006097 0.023690 0.257 0.79692

chcond2 0.045264 0.035353 1.280 0.20048

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.5095333)

Null deviance: 3305.5 on 5189 degrees of freedom

Residual deviance: 2639.4 on 5180 degrees of freedom

AIC: 11241

Number of Fisher Scoring iterations: 2

> plot(guassian)