Exercise 3.5Page 1 of 10
> data(dvisits)
> head(dvisits)
sex age agesq income levyplus freepoor freerepa illness actdays hscore
1 1 0.19 0.0361 0.55 1 0 0 1 4 1
2 1 0.19 0.0361 0.45 1 0 0 1 2 1
3 0 0.19 0.0361 0.90 0 0 0 3 0 0
4 0 0.19 0.0361 0.15 0 0 0 1 0 0
5 0 0.19 0.0361 0.45 0 0 0 2 5 1
6 1 0.19 0.0361 0.35 0 0 0 5 1 9
chcond1 chcond2 doctorco nondocco hospadmi hospdays medicine prescrib
1 0 0 1 0 0 0 1 1
2 0 0 1 0 0 0 2 1
3 0 0 1 0 1 4 2 1
4 0 0 1 0 0 0 0 0
5 1 0 1 0 0 0 3 1
6 1 0 1 0 0 0 1 1
nonpresc
1 0
2 1
3 1
4 0
5 2
6 0
A data frame with 5190 observations on the following 19 variables.
'sex' 1 if female, 0 if male
'age' Age in years divided by 100 (measured as mid-point of 10 age
groups from 15-19 years to 65-69 with 70 or more coded
treated as 72)
'agesq' age squared
'income' Annual income in Australian dollars divided by 1000
(measured as mid-point of coded ranges Nil, less than 200,
200-1000, 1001-, 2001-, 3001-, 4001-, 5001-, 6001-, 7001-,
8001-10000, 10001-12000, 12001-14000, with 14001- treated as
15000
'levyplus' 1 if covered by private health insurance fund for
private patient in public hospital (with doctor of choice), 0
otherwise
'freepoor' 1 if covered by government because low income, recent
immigrant, unemployed, 0 otherwise
'freerepa' 1 if covered free by government because of old-age or
disability pension, or because invalid veteran or family of
deceased veteran, 0 otherwise
'illness' Number of illnesses in past 2 weeks with 5 or more coded
as 5
'actdays' Number of days of reduced activity in past two weeks due
to illness or injury
'hscore' General health questionnaire score using Goldberg's
method. High score indicates bad health
'chcond1' 1 if chronic condition(s) but not limited in activity, 0
otherwise
'chcond2' 1 if chronic condition(s) and limited in activity, 0
otherwise
'doctorco' Number of consultations with a doctor or specialist in
the past 2 weeks
(a)Fit a model doctorco with sex…….concond2. Does the model fit the data?
The predictor “age” and “agesq” makes no difference in the model. So use age.
The model fits the data as the p-value of Goodness of fit test is approximate 1 and the R-sqr = 0.2226
> model1<- glm(doctorco ~ sex + age +income+ levyplus +freepoor+ freerepa+ illness +actdays +hscore +chcond1 +chcond2,family=poisson,dvisits)
> summary(model1)
Call:
glm(formula = doctorco ~ sex + age + income + levyplus + freepoor +
freerepa + illness + actdays + hscore + chcond1 + chcond2,
family = poisson, data = dvisits)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.9502 -0.6858 -0.5747 -0.4852 5.7055
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.097821 0.101554 -20.657 < 2e-16 ***
sex 0.156490 0.056139 2.788 0.00531 **
age 0.279123 0.165981 1.682 0.09264 .
income -0.187416 0.085478 -2.193 0.02834 *
levyplus 0.126498 0.071552 1.768 0.07707 .
freepoor -0.438462 0.179799 -2.439 0.01474 *
freerepa 0.083640 0.092070 0.908 0.36365
illness 0.186156 0.018263 10.193 < 2e-16 ***
actdays 0.126690 0.005031 25.184 < 2e-16 ***
hscore 0.030683 0.010074 3.046 0.00232 **
chcond1 0.117300 0.066545 1.763 0.07795 .
chcond2 0.150717 0.082260 1.832 0.06692 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 5634.8 on 5189 degrees of freedom
Residual deviance: 4380.1 on 5178 degrees of freedom
AIC: 6735.7
Number of Fisher Scoring iterations: 6
> pchisq(4380.1,5178,lower=FALSE)
[1] 1
R sqr =(5634.8-4380.1) / 5637.8 = 0.2226
(b)plot fitted value vs residue. Why there are lines ??
why ??
(c)backward eliminate predictors at significance lever 5%
dropped freerepa and levyplus. Below is the model with all predictors significant at 5% level of significance. (the R-sqr = 0.222, still a good model)
Model3<- glm(doctorco ~ sex +age +income +freepoor+ illness +actdays +hscore +chcond1 +chcond2,family=poisson,dvisits)
------
> drop1(model1,test="F")
Single term deletions
Model:
doctorco ~ sex + age + income + levyplus + freepoor + freerepa +
illness + actdays + hscore + chcond1 + chcond2
Df Deviance AIC F value Pr(F)
<none> 4380.1 6735.7
sex 1 4388.0 6741.5 9.2762 0.002333 **
age 1 4383.0 6736.5 3.3425 0.067571 .
income 1 4385.0 6738.6 5.7571 0.016457 *
levyplus 1 4383.3 6736.9 3.7248 0.053663 .
freepoor 1 4386.8 6740.4 7.8683 0.005050 **
freerepa 1 4381.0 6734.5 0.9787 0.322558 //drop
illness 1 4481.9 6835.4 120.2604 < 2.2e-16 ***
actdays 1 4917.1 7270.7 634.7816 < 2.2e-16 ***
hscore 1 4389.1 6742.7 10.6574 0.001103 **
chcond1 1 4383.2 6736.8 3.6834 0.055011 .
chcond2 1 4383.5 6737.0 3.9464 0.047025 *
---
------
Model2<- glm(doctorco ~ sex +age +income+ levyplus +freepoor+ illness +actdays +hscore +chcond1 +chcond2,family=poisson,dvisits)
> Model2<- glm(doctorco ~ sex +age +income+ levyplus +freepoor+ illness +actdays +hscore +chcond1 +chcond2,family=poisson,dvisits)
Call:
glm(formula = doctorco ~ sex + age + income + levyplus + freepoor +
illness + actdays + hscore + chcond1 + chcond2, family = poisson,
data = dvisits)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.0004 -0.6851 -0.5761 -0.4858 5.7284
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.089063 0.100811 -20.723 < 2e-16 ***
sex 0.162000 0.055824 2.902 0.00371 **
age 0.355131 0.143196 2.480 0.01314 *
income -0.199806 0.084328 -2.369 0.01782 *
levyplus 0.083689 0.053544 1.563 0.11805 //highest P-value
freepoor -0.469596 0.176360 -2.663 0.00775 **
illness 0.186101 0.018260 10.191 < 2e-16 ***
actdays 0.126611 0.005029 25.177 < 2e-16 ***
hscore 0.031116 0.010065 3.092 0.00199 **
chcond1 0.121100 0.066389 1.824 0.06814 .
chcond2 0.158894 0.081762 1.943 0.05197 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 5634.8 on 5189 degrees of freedom
Residual deviance: 4381.0 on 5179 degrees of freedom
AIC: 6734.5
Number of Fisher Scoring iterations: 6
> drop1(Model2,test="F")
Single term deletions
Model:
doctorco ~ sex + age + income + levyplus + freepoor + illness +
actdays + hscore + chcond1 + chcond2
Df Deviance AIC F value Pr(F)
<none> 4381.0 6734.5
sex 1 4389.5 6741.0 10.0573 0.0015263 **
age 1 4387.1 6738.7 7.3071 0.0068906 **
income 1 4386.7 6738.2 6.7387 0.0094609 **
levyplus 1 4383.4 6735.0 2.8777 0.0898717 . //drop (agree with summary)
freepoor 1 4389.1 6740.6 9.5747 0.0019833 **
illness 1 4482.7 6834.2 120.2237 < 2.2e-16 ***
actdays 1 4917.6 7269.2 634.3792 < 2.2e-16 ***
hscore 1 4390.2 6741.8 10.9716 0.0009316 ***
chcond1 1 4384.3 6735.9 3.9453 0.0470546 *
chcond2 1 4384.7 6736.3 4.4379 0.0351984 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Warning message:
F test assumes 'quasipoisson' family in: drop1.glm(Model2, test = "F")
------
> Model3<- glm(doctorco ~ sex +age +income +freepoor+ illness +actdays +hscore +chcond1 +chcond2,family=poisson,dvisits)
> summary(Model3)
Call:
glm(formula = doctorco ~ sex + age + income + freepoor + illness +
actdays + hscore + chcond1 + chcond2, family = poisson, data = dvisits)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.9109 -0.6843 -0.5758 -0.4901 5.7654
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.069666 0.100158 -20.664 < 2e-16 ***
sex 0.169389 0.055669 3.043 0.00234 **
age 0.348222 0.143284 2.430 0.01509 *
income -0.168246 0.082052 -2.050 0.04032 *
freepoor -0.499105 0.175288 -2.847 0.00441 **
illness 0.185559 0.018238 10.175 < 2e-16 ***
actdays 0.126423 0.005021 25.180 < 2e-16 ***
hscore 0.030678 0.010045 3.054 0.00226 **
chcond1 0.124662 0.066386 1.878 0.06040 . //potential to drop
chcond2 0.161590 0.081691 1.978 0.04792 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 5634.8 on 5189 degrees of freedom
Residual deviance: 4383.4 on 5180 degrees of freedom
AIC: 6735
Number of Fisher Scoring iterations: 6
> drop1(Model3,test="F")
Single term deletions
Model:
doctorco ~ sex + age + income + freepoor + illness + actdays +
hscore + chcond1 + chcond2
Df Deviance AIC F value Pr(F)
<none> 4383.4 6735.0
sex 1 4392.8 6742.3 11.0560 0.0008901 ***
age 1 4389.3 6738.9 7.0147 0.0081091 **
income 1 4387.7 6737.2 5.0487 0.0246869 *
freepoor 1 4392.8 6742.3 11.0839 0.0008769 ***
illness 1 4484.7 6834.3 119.7484 < 2.2e-16 ***
actdays 1 4919.6 7269.2 633.6426 < 2.2e-16 ***
hscore 1 4392.5 6742.0 10.7064 0.0010746 **
chcond1 1 4386.9 6736.5 4.1799 0.0409569 * // <0.05, not drop
chcond2 1 4387.3 6736.9 4.5957 0.0320994 *
---
all predictors are significant at 5% level
(d)the person predicted to visit the doctor the most under above model is
Coefficients:(Intercept) / -2.069666 / Given the other predictors are fixed:
sex / 0.169389 / female (compare to male)
age / 0.348222 / oldest
income / -0.168246 / lowest income
freepoor / -0.499105 / not (lower income, recent immigrant, unemployed)
illness / 0.185559 / has more illnesses in past 2 weeks
actdays / 0.126423 / Highest # of reduced days of activity in past 2 wks due to illness
hscore / 0.030678 / worst health condition
chcond1 / 0.124662 / chronic conditions and limited in activity
chcond2 / 0.16159 / Highest # of consultations with doctor in past two wks
(e)predicted value for the last record
The predicted # of visit for the last record is 0.15552
> tail(dvisits)
sex age agesq income levyplus freepoor freerepa illness actdays
5190 0 0.72 0.5184 0.25 0 0 1 0 0
hscore chcond1 chcond2 doctorco nondocco hospadmi hospdays
5190 0 0 0 0 0 0 0
Coefficients:(Intercept) / -2.069666 / 1 / -2.0697
sex / 0.169389 / 0 / 0
age / 0.348222 / 0.72 / 0.25072
income / -0.168246 / 0.25 / -0.04206
freepoor / -0.499105 / 0 / 0
illness / 0.185559 / 0 / 0
actdays / 0.126423 / 0 / 0
hscore / 0.030678 / 0 / 0
chcond1 / 0.124662 / 0 / 0
chcond2 / 0.16159 / 0 / 0
sum / -1.86101
exp(sum) / 0.15552
(f)compare the fit of Guassian model to poisson model
Guassian :R-square 0.2015
Poisson :R-square 0.2220
When we looked at the residual plot vs fitted value, both have lines (Guassian-straight lines, Poisson – curved lines) and patterns are similar.
However the residuals are different
Guassian : # 48, 599, 663
Poisson : # 48, 334, 711
> guassian<- glm(doctorco ~ sex +age +income +freepoor+ illness +actdays +hscore +chcond1 +chcond2,family=gaussian,dvisits)
> summary(guassian)
Call:
glm(formula = doctorco ~ sex + age + income + freepoor + illness +
actdays + hscore + chcond1 + chcond2, family = gaussian,
data = dvisits)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.13659 -0.25838 -0.14573 -0.04692 7.01165
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.042005 0.035497 1.183 0.23672
sex 0.038827 0.021312 1.822 0.06855 .
age 0.177070 0.055853 3.170 0.00153 **
income -0.052225 0.029473 -1.772 0.07646 .
freepoor -0.119927 0.051012 -2.351 0.01876 *
illness 0.059977 0.008331 7.199 6.93e-13 ***
actdays 0.103102 0.003656 28.203 < 2e-16 ***
hscore 0.016972 0.005179 3.277 0.00106 **
chcond1 0.006097 0.023690 0.257 0.79692
chcond2 0.045264 0.035353 1.280 0.20048
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.5095333)
Null deviance: 3305.5 on 5189 degrees of freedom
Residual deviance: 2639.4 on 5180 degrees of freedom
AIC: 11241
Number of Fisher Scoring iterations: 2
> plot(guassian)