HW8 solution
8.4
a. data verdict;
input v $ d $ p $ count;
datalines;
white white yes 53
white white no 414
white black yes 11
white black no 37
black white yes 0
black white no 16
black black yes 4
black black no 139
;
data verdict1;
set verdict;
count = count + 0.0001;
run;
proccatmoddata=verdict1;
weight count;
model d*v*p= _response_/param=ref;
loglin d|v|p@2;
run;
procgenmoddata=verdict1;
class v d p;
model count=v|d|p@2/dist = poi link = log;
outputout=verdictpred pred=fitted;
run;
procfreqdata=verdictpred;
weight fitted;
table v*d*p/relrisk;
run;
The odds ratio between D and P at each level of V is 0.42. It means that the odds of a no death penalty verdict for a black defendant is 0.42 times the odds for a white defendant for given race of the victim. Therefore, black defendants tend to get more death penalty verdict.
b. procsql;
createtable marginal as
select
d, p, sum (count) as count, sum(fitted) as fitted
from verdictpred
groupby d, p;
procfreqdata=marginal;
weight count;
table d*p/relrisk;
run;
procfreqdata=marginal;
weight fitted;
table d*p/relrisk;
run;
Both odds ratio are 1.45. They are equal because the loglinear model (DV, DP, PV) is a saturated model for the variables D and P. If we consider the marginal data formed by D and P, the loglinear model fit is exactly the same as the sample marginal table. This odds ratio 1.45 is greater than 1, however, if we look at the odds ratio conditional on each level of the victim's race, it is 0.42 which is less than 1. The Simpson's paradox occurs because the large imbalance between the sample sizes of the two victim's races. They are 515 white victims and 159 black victims. When aggregating data over victim's race, the imbalance in sample sizes can cause the association between D and P to reverse.
c. procgenmoddata=verdict;
class d v;
freq count;
modelp=d v/scale=1dist=bin link=logit aggregate=(d v);
run;
From the logistic model,
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 1 0.3781 0.3781
Scaled Deviance 1 0.3781 0.3781
Pearson Chi-Square 1 0.1976 0.1976
Scaled Pearson X2 1 0.1976 0.1976
Standard Wald 95% Confidence Wald
Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq
Intercept 1 2.0595 0.1458 1.7736 2.3453 199.40 <.0001
d black 1 -0.8678 0.3671 -1.5872 -0.1483 5.59 0.0181
v black 1 2.4044 0.6006 1.2273 3.5816 16.03 <.0001
From the loglinear model
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 1 0.3798 0.3798
Scaled Deviance 1 0.3798 0.3798
Pearson Chi-Square 1 0.1978 0.1978
Scaled Pearson X2 1 0.1978 0.1978
Standard Wald 95% Confidence Wald
Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq
p no 1 2.0595 0.1458 1.7736 2.3453 199.40 <.0001
v*p black no 1 2.4044 0.6006 1.2272 3.5816 16.03 <.0001
d*p black no 1 -0.8678 0.3671 -1.5872 -0.1483 5.59 0.0181
d. procgenmoddata=verdict;
class d v;
freq count;
modelp=v/scale=1dist=bin link=logit aggregate=(d v);
run;
procgenmoddata=verdict1;
class v d p;
model count=v|p d|v/dist = poi link = log;
run;
The G2 statistics for the reduced model with D removed is 5.45 on 2 df which has p-value 0.067. Therefore the reduced model is barely acceptable. The reduced model means D and P are conditionally independent given V. The corresponding log linear model is shown in the second genmod procedure above.
8.5
a. data accident;
input safety $ ejected $ injury $ count;
datalines;
belt yes nonfatal 1105
belt yes fatal 14
belt no nonfatal 411111
belt no fatal 483
none yes nonfatal 4624
none yes fatal 497
none no nonfatal 157342
none no fatal 1008
;
proccatmoddata=accident;
weight count;
model safety*ejected*injury= _response_/pred=freq ;
loglin safety|ejected|injury;
odsoutput predictedfreqs=tmp1 anova=temp1(where=(source='Likelihood Ratio'));
run;
procsql;
createtable d as
select sum( abs(obsfunction-predfunction) ) / (2*sum(obsfunction)) as d
from tmp1;
data temp1;
merge temp1 d;
model = 'safety|ejected|injury';
run;
proccatmoddata=accident;
weight count;
model safety*ejected*injury= _response_/pred=freq ;
loglin safety|ejected|injury@2;
odsoutput predictedfreqs=tmp2 anova=temp2(where=(source='Likelihood Ratio'));
run;
procsql;
createtable d as
select sum( abs(obsfunction-predfunction) ) / (2*sum(obsfunction)) as d
from tmp2;
data temp2;
merge temp2 d;
model = 'safety|ejected|injury@2';
run;
proccatmoddata=accident;
weight count;
model safety*ejected*injury= _response_/pred=freq ;
loglin safety|ejected ejected|injury;
odsoutput predictedfreqs=tmp3 anova=temp3(where=(source='Likelihood Ratio'));
run;
procsql;
createtable d as
select sum( abs(obsfunction-predfunction) ) / (2*sum(obsfunction)) as d
from tmp3;
data temp3;
merge temp3 d;
model = 'safety|ejected ejected|injury';
run;
proccatmoddata=accident;
weight count;
model safety*ejected*injury= _response_/pred=freq ;
loglin safety|injury ejected|injury;
odsoutput predictedfreqs=tmp4 anova=temp4(where=(source='Likelihood Ratio'));
run;
procsql;
createtable d as
select sum( abs(obsfunction-predfunction) ) / (2*sum(obsfunction)) as d
from tmp4;
data temp4;
merge temp4 d;
model = 'safety|injury ejected|injury';
run;
proccatmoddata=accident;
weight count;
model safety*ejected*injury= _response_/pred=freq ;
loglin safety|injury ejected|safety;
odsoutput predictedfreqs=tmp5 anova=temp5(where=(source='Likelihood Ratio'));
run;
procsql;
createtable d as
select sum( abs(obsfunction-predfunction) ) / (2*sum(obsfunction)) as d
from tmp5;
data temp5;
merge temp5 d;
model = 'safety|injury ejected|safety';
run;
proccatmoddata=accident;
weight count;
model safety*ejected*injury= _response_/pred=freq ;
loglin safety|injury ejected;
odsoutput predictedfreqs=tmp6 anova=temp6(where=(source='Likelihood Ratio'));
run;
procsql;
createtable d as
select sum( abs(obsfunction-predfunction) ) / (2*sum(obsfunction)) as d
from tmp6;
data temp6;
merge temp6 d;
model = 'safety|injury ejected';
run;
proccatmoddata=accident;
weight count;
model safety*ejected*injury= _response_/pred=freq ;
loglin safety injury|ejected;
odsoutput predictedfreqs=tmp7 anova=temp7(where=(source='Likelihood Ratio'));
run;
procsql;
createtable d as
select sum( abs(obsfunction-predfunction) ) / (2*sum(obsfunction)) as d
from tmp7;
data temp7;
merge temp7 d;
model = 'safety injury|ejected';
run;
proccatmoddata=accident;
weight count;
model safety*ejected*injury= _response_/pred=freq ;
loglin safety|ejected injury;
odsoutput predictedfreqs=tmp8 anova=temp8(where=(source='Likelihood Ratio'));
run;
procsql;
createtable d as
select sum( abs(obsfunction-predfunction) ) / (2*sum(obsfunction)) as d
from tmp8;
data temp8;
merge temp8 d;
model = 'safety|ejected injury';
run;
proccatmoddata=accident;
weight count;
model safety*ejected*injury= _response_/pred=freq ;
loglin safety ejected injury;
odsoutput predictedfreqs=tmp9 anova=temp9(where=(source='Likelihood Ratio'));
run;
procsql;
createtable d as
select sum( abs(obsfunction-predfunction) ) / (2*sum(obsfunction)) as d
from tmp9;
data temp9;
merge temp9 d;
model = 'safety ejected injury';
run;
data combo;
set temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9;
rename chisq=G2;
rename probchisq=P_value;
run;
procprintdata=combo noobs;
var model G2 df P_value d;
run;
Model G2 DF P_value d
safety|ejected|injury . 0 . 0.000000
safety|ejected|injury@2 2.85 1 0.0911 0.000048
safety|ejected ejected|injury 1144.64 2 <.0001 0.002330
safety|injury ejected|injury 7133.98 2 <.0001 0.010833
safety|injury ejected|safety 1680.41 2 <.0001 0.001605
safety|injury ejected 9557.06 3 <.0001 0.011660
safety injury|ejected 9021.29 3 <.0001 0.013661
safety|ejected injury 3567.72 3 <.0001 0.003288
safety ejected injury 11444.38 4 <.0001 0.014028
The above table suggests that based on P-value, the model of choice is safety|ejected|injury@2 which is a homogeneous association model. That is, given any variable, the other two variables are independent.
b. procgenmoddata=accident;
freq count;
class safety ejected injury;
model injury=safety ejected/dist=bin link=logit;
run;
c. The dissimilarity index for the homogeneous model is 0.000048, meaning that the sample data follow the model pattern very closely. Actually the mutual independent model has a dissimilarity index 0.014 which is also acceptable.