HRP 261 SAS LAB TWO January 21, 2009
Lab Two: Mantel-Haenszel test and Mantel-Haenszel OR/RR, introduction to McNemar’s Test
Lab Objectives
After today’s lab you should be able to:
- Distinguish between numeric and character variables in SAS.
- Use PROC FREQ to generate stratified 2x2 tables.
- Use PROC FREQ to generate Mantel-Haenszel statistics and the Mantel-Haenszel summary OR/RR and Breslow-Day test of Homogeneity.
- Interpret Mantel-Haenszel results.
- Practice using the RESULTS browser (left hand side of screen) to scroll through output.
- Input paired data directly into SAS.
- Use PROC FREQ to generate McNemar’s statistic for paired or matched data.
- Change grouped data into a dataset that contains 1 observation for each individual.
- Use a do-loop.
- Use an if-then-do loop.
- Use nested loops.
- Use the output statement to add observations to a dataset that you are modifying.
- See the connection between CMH and McNemar’s tests.
LAB EXERCISE STEPS:
Follow along with the computer in front…
- Double-click on the SAS icon to open SAS.
- Goto the class website: --> Lab 2 Data
Highlight the following data with your mouse; copy with ctrl C:
A 1 1 19
A 1 0 89
A 0 1 314
A 0 0 511
B 1 1 8
B 1 0 17
B 0 1 208
B 0 0 352
C 1 1 391
C 1 0 202
C 0 1 205
C 0 0 120
D 1 1 248
D 1 0 127
D 0 1 265
D 0 0 142
E 1 1 289
E 1 0 104
E 0 1 147
E 0 0 44
F 1 1 321
F 1 0 20
F 0 1 347
F 0 0 26
3. Return to SAS. Use ctrl V to paste the data into the SAS editor window. Then add code accordingly to recreate the code below (and enter data into a SAS dataset):
data admissions;
input program $ IsFemale Denied Count;
datalines;
A 1 1 19
A 1 0 89
A 0 1 314
A 0 0 511
B 1 1 8
B 1 0 17
B 0 1 208
B 0 0 352
C 1 1 391
C 1 0 202
C 0 1 205
C 0 0 120
D 1 1 248
D 1 0 127
D 0 1 265
D 0 0 142
E 1 1 289
E 1 0 104
E 0 1 147
E 0 0 44
F 1 1 321
F 1 0 20
F 0 1 347
F 0 0 26
;
run;
4.Using the Explorer Browser on the left hand side of your screen, double check that a dataset “admissions” has been created in the work library.
5.Test for the crude association between gender and admissions using PROC FREQ.
/**IS gender related to denial of graduate admissions at Berkeley?**/
procfreqdata=admissions order=data;
tables isfemale*denied /chisqnocolnopercentnorowmeasuresexpected ;
weight count;
run;
Table of IsFemale by Denied
IsFemale Denied
Frequency‚
Expected ‚ 1‚ 0‚ Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
1 ‚ 1276 ‚ 559 ‚ 1835
‚ 1122.3 ‚ 712.71 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
0 ‚ 1486 ‚ 1195 ‚ 2681
‚ 1639.7 ‚ 1041.3 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 2762 1754 4516
Statistics for Table of IsFemale by Denied
Statistic DF Value Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square 1 73.8695 <.0001
Likelihood Ratio Chi-Square 1 75.0511 <.0001
Continuity Adj. Chi-Square 1 73.3243 <.0001
Mantel-Haenszel Chi-Square 1 73.8531 <.0001
Phi Coefficient 0.1279
Contingency Coefficient 0.1269
Cramer's V 0.1279
Fisher's Exact Test
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Cell (1,1) Frequency (F) 1276
Left-sided Pr <= F 1.0000
Right-sided Pr >= F 4.579E-22
Table Probability (P) 3.846E-22
Two-sided Pr <= P 8.408E-22
Estimates of the Relative Risk (Row1/Row2)
Type of Study Value 95% Confidence Limits
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Case-Control (Odds Ratio) 1.8356 1.6196 2.0805
Cohort (Col1 Risk) 1.2546 1.1988 1.3130
Cohort (Col2 Risk) 0.6834 0.6303 0.7411
Sample Size = 4516
6. Could program be a confounder of the relationship between gender and denial of admissions?
Is program related to gender?
Is program related to admissions rates?
procfreqdata=admissions order=data;
tables program*isfemale /chisqnocolnopercentnorowmeasures ;
weight count;
run;
Table of program by IsFemale
program IsFemale
Frequency‚
Row Pct ‚ 1‚ 0‚ Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
A ‚ 108 ‚ 825 ‚ 933
‚ 11.58 ‚ 88.42 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
B ‚ 25 ‚ 560 ‚ 585
‚ 4.27 ‚ 95.73 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
C ‚ 593 ‚ 325 ‚ 918
‚ 64.60 ‚ 35.40 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
D ‚ 375 ‚ 407 ‚ 782
‚ 47.95 ‚ 52.05 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
E ‚ 393 ‚ 191 ‚ 584
‚ 67.29 ‚ 32.71 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
F ‚ 341 ‚ 373 ‚ 714
‚ 47.76 ‚ 52.24 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 1835 2681 4516
Statistics for Table of program by IsFemale
Statistic DF Value Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square 5 1070.2064 <.0001
procfreqdata=admissions order=data;
tables program*denied /chisqnocolnopercentnorowmeasures ;
weight count;
run;
The FREQ Procedure
Table of program by Denied
program Denied
Frequency‚
Row Pct ‚ 1‚ 0‚ Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
A ‚ 333 ‚ 600 ‚ 933
‚ 35.69 ‚ 64.31 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
B ‚ 216 ‚ 369 ‚ 585
‚ 36.92 ‚ 63.08 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
C ‚ 596 ‚ 322 ‚ 918
‚ 64.92 ‚ 35.08 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
D ‚ 513 ‚ 269 ‚ 782
‚ 65.60 ‚ 34.40 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
E ‚ 436 ‚ 148 ‚ 584
‚ 74.66 ‚ 25.34 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
F ‚ 668 ‚ 46 ‚ 714
‚ 93.56 ‚ 6.44 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 2762 1754 4516
Statistics for Table of program by Denied
Statistic DF Value Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square 5 771.6742 <.0001
So, program is strongly related to both gender and admissions.
7. Now, stratify on program…
/**IS gender related to denial of graduate admissions at Berkeley after adjusting for program?**/
procfreqdata=admissions order=data;
tables program*isfemale*denied /cmhnocolnopercentnorowmeasures;
weight count;
run;
The FREQ Procedure
Table 1 of IsFemale by Denied
Controlling for program=A
IsFemale Denied
Frequency‚ 1‚ 0‚ Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
1 ‚ 19 ‚ 89 ‚ 108
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
0 ‚ 314 ‚ 511 ‚ 825
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 333 600 933
Type of Study Value 95% Confidence Limits
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Case-Control (Odds Ratio) 0.3474 0.2076 0.5814
Cohort (Col1 Risk) 0.4622 0.3045 0.7016
Cohort (Col2 Risk) 1.3305 1.2011 1.4737
Sample Size = 933
Controlling for program=B
IsFemale Denied
Frequency‚ 1‚ 0‚ Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
1 ‚ 8 ‚ 17 ‚ 25
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
0 ‚ 208 ‚ 352 ‚ 560
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 216 369 585
Estimates of the Relative Risk (Row1/Row2)
Type of Study Value 95% Confidence Limits
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Case-Control (Odds Ratio) 0.7964 0.3378 1.8775
Cohort (Col1 Risk) 0.8615 0.4817 1.5410
Cohort (Col2 Risk) 1.0818 0.8206 1.4262
Sample Size = 585
Controlling for program=C
IsFemale Denied
Frequency‚ 1‚ 0‚ Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
1 ‚ 391 ‚ 202 ‚ 593
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
0 ‚ 205 ‚ 120 ‚ 325
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 596 322 918
Type of Study Value 95% Confidence Limits
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Case-Control (Odds Ratio) 1.1331 0.8545 1.5024
Cohort (Col1 Risk) 1.0453 0.9446 1.1568
Cohort (Col2 Risk) 0.9226 0.7699 1.1055
Sample Size = 918
ETC...
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic Alternative Hypothesis DF Value Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1 Nonzero Correlation 1 1.4972 0.2211
2 Row Mean Scores Differ 1 1.4972 0.2211
3 General Association 1 1.4972 0.2211
Estimates of the Common Relative Risk (Row1/Row2)
Type of Study Method Value 95% Confidence Limits
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Case-Control Mantel-Haenszel 0.9049 0.7717 1.0612
(Odds Ratio) Logit 0.9284 0.7894 1.0918
Cohort Mantel-Haenszel 0.9733 0.9312 1.0173
(Col1 Risk) Logit 1.0043 0.9728 1.0368
Cohort Mantel-Haenszel 1.0581 0.9695 1.1548
(Col2 Risk) Logit 1.1558 1.0727 1.2454
Breslow-Day Test for
Homogeneity of the Odds Ratios
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square 18.5989
DF 5
Pr > ChiSq 0.0023
Total Sample Size = 4516
8. Just to show the parallels (and as a preview of what’s to come), let’s also run a multivariate regression (logistic regression) with and without adjustment for confounding by program to see what effect that has on the point estimate for gender. Don’t worry too much about the coding right now—we’ll get back to this later in the term.
This is a simple logistic model with gender as the only predictor:
proclogisticdata=admissions;
model denied (event="1") = isfemale;
weight count;
run;
Here is the parameter estimate (beta coefficient) for gender:
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 0.2179 0.0389 31.4605 <.0001
IsFemale 1 0.6073 0.0639 90.3529 <.0001
To adjust for confounding by program, simply add program to the model. Now we have a logistic model with two predictors, gender and program. The resulting beta coefficients are each “adjusted for” each other:
proclogisticdata=admissions;
class program;
model denied (event="1") = isfemale program;
weight count;
run;
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 0.6893 0.0513 180.2711 <.0001
IsFemale 1 -0.0990 0.0809 1.4986 0.2209
9. We can also check for an interaction (effect modification) between gender and program by adding an interaction term into the model:
proclogisticdata=admissions;
class program;
model denied (event="1") = isfemale program isfemale*program;
weight count;
run;
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 0.6573 0.0547 144.4267 <.0001
IsFemale 1 -0.1858 0.1107 2.8175 0.0932
program A 1 -1.1443 0.0801 203.9757 <.0001
program B 1 -1.1834 0.0899 173.0928 <.0001
program C 1 -0.1218 0.1086 1.2567 0.2623
program D 1 -0.0334 0.1010 0.1092 0.7410
program E 1 0.5490 0.1506 13.2884 0.0003
IsFemale*program A 1 -0.8714 0.2414 13.0361 0.0003
IsFemale*program B 1 -0.0419 0.3740 0.0126 0.9108
IsFemale*program C 1 0.3107 0.1614 3.7039 0.0543
IsFemale*program D 1 0.2311 0.1655 1.9496 0.1626
IsFemale*program E 1 0.00157 0.2016 0.0001 0.9938
PartII: McNemar’s Test is used when you have paired or matched data, such as in a matched case-control study. The idea is: you analyze the data by pairs rather than by individuals. Only pairs that are discordant (i.e., cases and controls differ on exposure) provide useful information. Under the null hypothesis, there should be just as many discordant pairs where the case is exposed as pairs where the control is exposed. (We will discuss McNemar’s formally in class next Monday).
10. Input the following matched dataset* and run McNemar’s Test, using the option “agree” in PROC FREQ.
data the_pill;
input caseuse $ contruse $ n;
datalines;
Y Y 10
Y N 57
N Y 13
N N 95
;
procfreqdata=the_pill order=data;
tables caseuse*contruse /agree;
weight n;
run;
caseuse contruse
Frequency‚
Percent ‚
Row Pct ‚
Col Pct ‚Y ‚N ‚ Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Y ‚ 10 ‚ 57 ‚ 67
‚ 5.71 ‚ 32.57 ‚ 38.29
‚ 14.93 ‚ 85.07 ‚
‚ 43.48 ‚ 37.50 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
N ‚ 13 ‚ 95 ‚ 108
‚ 7.43 ‚ 54.29 ‚ 61.71
‚ 12.04 ‚ 87.96 ‚
‚ 56.52 ‚ 62.50 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 23 152 175
13.14 86.86 100.00
There are 57 discordant pairs where the case was exposed; and only 13 where the control was exposed. Under the null hypothesis, there should be 70/2=35 pairs of each. Is 57/13 significantly different from 35/35? YES.
Statistics for Table of caseuse by contruse
McNemar's Test
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Statistic (S) 27.6571
DF 1
Pr > S <.0001
*Example from “A Handbook of Statistical Analyses using SAS” Der and Everitt.
11. Next, we will explore the connection between Cochran-Mantel-Haenszel and McNemar’s Tests. To turn the paired, grouped data into a dataset that contains 1 observation for each individual, use the following code:
data the_pill2;
set the_pill;
if caseuse='Y' and contruse='Y'thendo;
do id=1to n;
isCase=1;
UsePill=1;
output;
isCase=0;
UsePill=1;
output;
end;
end;
if caseuse='Y' and contruse='N'thendo;
do id=101to (n+100);
isCase=1;
UsePill=1;
output;
isCase=0;
UsePill=0;
output;
end;
end;
if caseuse='N' and contruse='Y'thendo;
do id=201to (n+200);
isCase=1;
UsePill=0;
output;
isCase=0;
UsePill=1;
output;
end;
end;
if caseuse='N' and contruse='N'thendo;
do id=301to (n+300);
isCase=1;
UsePill=0;
output;
isCase=0;
UsePill=0;
output;
end;
end;
run;
- To better understand what we’ve done here, look at the dataset. Use the Explorer Browser. Double-click on the Work Library. Double click on the_pill2 to open the dataset in ViewTable mode.
- Type in the following code to run the CMH test on the stratified data, where each pair forms its own unique stratum.
procfreqdata=the_pill2;
tables id*isCase*UsePill/nocolnorownopercentcmhmeasures;
run;
- In the Results Browser (left-hand window of your screen), scroll down to any of the 175 2x2 tables. Click on anyone to see what it looks like. Then Scroll to the very bottom to the “Summary for isCase*UsePill” to see the CMH statistics.
For example, table 98:
Table 98 of isCase by UsePill
Controlling for id=318
isCase UsePill
Frequency‚ 0‚ 1‚ Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
0 ‚ 1 ‚ 0 ‚ 1
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
1 ‚ 1 ‚ 0 ‚ 1
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total 2 0 2
- Compare the CMH chi-square statistic to the McNemar’s chi-square statistic (from step10):
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic Alternative Hypothesis DF Value Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1 Nonzero Correlation 1 27.6571 <.0001
2 Row Mean Scores Differ 1 27.6571 <.0001
3 General Association 1 27.6571 <.0001
1