*** GLM: crab data of table 4.3 CDA by Agresti;
*** mixed factors; model selection;
proc format;
value colorgp 1='light medium'
2='medium'
3='dark medium'
4='dark';
value spinecdtn 1='both good'
2='one worn'
3='both worn';
run;
data crab;
input color spine width satell weight;
weight=weight/1000; color=color-1;
** response =1 for at least one satellites;
if satell > 0 then y=1;
else y=0;
format color colorgp.
spine spinecdtn.;
datalines;
3 3 28.3 8 3050
…
3 2 24.5 0 2000
;
run;
*** fit logistic regression for color and width;
proc genmod data=crab descending;
class color;
model y = color width color*width/ dist=bin link=logit;
run;
*** assume no interaction between color and width;
proc genmod data=crab descending;
class color;
model y = color width/ dist=bin link=logit;
run;
*** treat color as quanitative factor;
proc genmod data=crab descending;
model y = color width/dist=bin link=logit;
run;
*** model selection: you can use forward or stepwise instead;
proc logistic data=crab descending;
class color(param=ref ref=’dark’) spine;
model y =color|spine|width / selection=backward;
run;
data crab2;
set crab;
if color=4 then dark=1;
else dark=0;
run;
*** model: width and dark color;
proc genmod data=crab2 descending;
model y= dark width/dist=bin link=logit p r;
output out=diag pred=fitted stdreschi=std_res;
run;
The GENMOD Procedure
Model Information
Data Set WORK.CRAB
Distribution Binomial
Link Function Logit
Dependent Variable y
Number of Observations Read 173
Number of Observations Used 173
Number of Events 111
Number of Trials 173
Class Level Information
Class Levels Values
color 4 dark dark medium light medium medium
Response Profile
Ordered Total
Value y Frequency
1 1 111
2 0 62
PROC GENMOD is modeling the probability that y='1'.
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 165 183.0806 1.1096
Scaled Deviance 165 183.0806 1.1096
Pearson Chi-Square 165 163.2536 0.9894
Scaled Pearson X2 165 163.2536 0.9894
Log Likelihood -91.5403
Algorithm converged.
The SAS System 14:31 Sunday, February 5, 2006 53
The GENMOD Procedure
Analysis Of Parameter Estimates
Standard Wald 95% Confidence Chi-
Parameter DF Estimate Error Limits Square Pr > ChiSq
Intercept 1 -10.0400 3.5583 -17.0142 -3.0657 7.96 0.0048
color dark 1 4.1861 7.5809 -10.6722 19.0445 0.30 0.5808
color dark medium 1 -11.4781 7.6980 -26.5658 3.6097 2.22 0.1359
color light medium 1 8.2874 12.0036 -15.2393 31.8140 0.48 0.4899
color medium 0 0.0000 0.0000 0.0000 0.0000 . .
width 1 0.4189 0.1367 0.1509 0.6869 9.38 0.0022
width*color dark 1 -0.2184 0.2952 -0.7971 0.3602 0.55 0.4594
width*color dark medium 1 0.4395 0.3018 -0.1521 1.0311 2.12 0.1454
width*color light medium 1 -0.3129 0.4479 -1.1908 0.5651 0.49 0.4849
width*color medium 0 0.0000 0.0000 0.0000 0.0000 . .
Scale 0 1.0000 0.0000 1.0000 1.0000
NOTE: The scale parameter was held fixed.
The GENMOD Procedure
Model Information
Data Set WORK.CRAB
Distribution Binomial
Link Function Logit
Dependent Variable y
Number of Observations Read 173
Number of Observations Used 173
Number of Events 111
Number of Trials 173
Class Level Information
Class Levels Values
color 4 dark dark medium light medium medium
Response Profile
Ordered Total
Value y Frequency
1 1 111
2 0 62
PROC GENMOD is modeling the probability that y='1'.
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 168 187.4570 1.1158
Scaled Deviance 168 187.4570 1.1158
Pearson Chi-Square 168 168.6590 1.0039
Scaled Pearson X2 168 168.6590 1.0039
Log Likelihood -93.7285
Algorithm converged.
Analysis Of Parameter Estimates
Standard Wald 95% Confidence Chi-
Parameter DF Estimate Error Limits Square Pr > ChiSq
Intercept 1 -11.3128 2.7451 -16.6931 -5.9324 16.98 <.0001
color dark 1 -1.4023 0.5484 -2.4773 -0.3274 6.54 0.0106
color dark medium 1 -0.2962 0.4176 -1.1147 0.5223 0.50 0.4781
color light medium 1 -0.0724 0.7399 -1.5226 1.3778 0.01 0.9220
color medium 0 0.0000 0.0000 0.0000 0.0000 . .
width 1 0.4680 0.1055 0.2611 0.6748 19.66 <.0001
Scale 0 1.0000 0.0000 1.0000 1.0000
NOTE: The scale parameter was held fixed.
The SAS System 14:31 Sunday, February 5, 2006 56
The GENMOD Procedure
Model Information
Data Set WORK.CRAB
Distribution Binomial
Link Function Logit
Dependent Variable y
Number of Observations Read 173
Number of Observations Used 173
Number of Events 111
Number of Trials 173
Response Profile
Ordered Total
Value y Frequency
1 1 111
2 0 62
PROC GENMOD is modeling the probability that y='1'.
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 170 189.1212 1.1125
Scaled Deviance 170 189.1212 1.1125
Pearson Chi-Square 170 170.0759 1.0004
Scaled Pearson X2 170 170.0759 1.0004
Log Likelihood -94.5606
Algorithm converged.
Analysis Of Parameter Estimates
Standard Wald 95% Confidence Chi-
Parameter DF Estimate Error Limits Square Pr > ChiSq
Intercept 1 -10.0708 2.8069 -15.5722 -4.5695 12.87 0.0003
color 1 -0.5090 0.2237 -0.9475 -0.0706 5.18 0.0229
width 1 0.4583 0.1040 0.2544 0.6622 19.41 <.0001
Scale 0 1.0000 0.0000 1.0000 1.0000
NOTE: The scale parameter was held fixed.
The SAS System 13:46 Wednesday, November 4, 2009 186
The LOGISTIC Procedure
Model Information
Data Set WORK.CRAB
Response Variable y
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 173
Number of Observations Used 173
Response Profile
Ordered Total
Value y Frequency
1 1 111
2 0 62
Probability modeled is y=1.
Backward Elimination Procedure
Class Level Information
Class Value Design Variables
color dark 1 0 0
dark medium 0 1 0
light medium 0 0 1
medium -1 -1 -1
spine both good 1 0
both worn 0 1
one worn -1 -1
Step 0. The following effects were entered:
Intercept color spine color*spine width width*color width*spine width*color*spine
Model Convergence Status
Quasi-complete separation of data points detected.
WARNING: The maximum likelihood estimate may not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based
The SAS System 13:46 Wednesday, November 4, 2009 187
The LOGISTIC Procedure
on the last maximum likelihood iteration. Validity of the model fit is questionable.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 227.759 212.446
SC 230.912 278.665
-2 Log L 225.759 170.446
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 55.3129 20 <.0001
Score 47.1111 20 0.0006
Wald 29.6574 20 0.0756
Step 1. Effect width*color*spine is removed:
Model Convergence Status
Quasi-complete separation of data points detected.
WARNING: The maximum likelihood estimate may not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based
on the last maximum likelihood iteration. Validity of the model fit is questionable.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 227.759 209.674
SC 230.912 266.433
-2 Log L 225.759 173.674
The SAS System 13:46 Wednesday, November 4, 2009 188
The LOGISTIC Procedure
WARNING: The validity of the model fit is questionable.
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 52.0846 17 <.0001
Score 46.2789 17 0.0002
Wald 31.0713 17 0.0196
Residual Chi-Square Test
Chi-Square DF Pr > ChiSq
1.3313 3 0.7217
Step 2. Effect color*spine is removed:
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 227.759 205.559
SC 230.912 243.398
-2 Log L 225.759 181.559
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 44.1997 11 <.0001
Score 40.4197 11 <.0001
Wald 31.9610 11 0.0008
Residual Chi-Square Test
Chi-Square DF Pr > ChiSq
9.0092 9 0.4364
The SAS System 13:46 Wednesday, November 4, 2009 189
The LOGISTIC Procedure
WARNING: The validity of the model fit is questionable.
Step 3. Effect width*spine is removed:
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 227.759 201.637
SC 230.912 233.170
-2 Log L 225.759 181.637
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 44.1215 9 <.0001
Score 40.3715 9 <.0001
Wald 31.9987 9 0.0002
Residual Chi-Square Test
Chi-Square DF Pr > ChiSq
8.8373 11 0.6369
Step 4. Effect spine is removed:
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
The SAS System 13:46 Wednesday, November 4, 2009 190
The LOGISTIC Procedure
WARNING: The validity of the model fit is questionable.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 227.759 199.081
SC 230.912 224.307
-2 Log L 225.759 183.081
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 42.6779 7 <.0001
Score 38.5321 7 <.0001
Wald 30.4875 7 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > ChiSq
9.6978 13 0.7184
Step 5. Effect width*color is removed:
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 227.759 197.457
SC 230.912 213.223
-2 Log L 225.759 187.457
The SAS System 13:46 Wednesday, November 4, 2009 191
The LOGISTIC Procedure
WARNING: The validity of the model fit is questionable.
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 38.3015 4 <.0001
Score 34.3384 4 <.0001
Wald 27.6788 4 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > ChiSq
12.8933 16 0.6805
Step 6. Effect color is removed:
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 227.759 198.453
SC 230.912 204.759
-2 Log L 225.759 194.453
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 31.3059 1 <.0001
Score 27.8752 1 <.0001
Wald 23.8872 1 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > ChiSq
20.6100 19 0.3587
The SAS System 13:46 Wednesday, November 4, 2009 192
The LOGISTIC Procedure
WARNING: The validity of the model fit is questionable.
NOTE: No (additional) effects met the 0.05 significance level for removal from the model.
Summary of Backward Elimination
Effect Number Wald
Step Removed DF In Chi-Square Pr > ChiSq
1 width*color*spine 3 6 0.0944 0.9925
2 color*spine 6 5 0.0920 1.0000
3 width*spine 2 4 0.0761 0.9627
4 spine 2 3 1.4323 0.4886
5 width*color 3 2 3.8871 0.2739
6 color 3 1 6.6246 0.0849
Type 3 Analysis of Effects
Wald
Effect DF Chi-Square Pr > ChiSq
width 1 23.8872 <.0001
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -12.3508 2.6287 22.0749 <.0001
width 1 0.4972 0.1017 23.8872 <.0001
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
width 1.644 1.347 2.007
Output for model 8
Analysis Of Parameter Estimates
Standard Wald 95% Confidence Chi-
Parameter DF Estimate Error Limits Square Pr > ChiSq
Intercept 1 -11.6790 2.6925 -16.9563 -6.4017 18.81 <.0001
dark 1 -1.3005 0.5259 -2.3312 -0.2698 6.12 0.0134
width 1 0.4782 0.1041 0.2741 0.6823 21.08 <.0001
Scale 0 1.0000 0.0000 1.0000 1.0000