HRP 261 SAS LAB TWO January 21, 2009

Lab Two: Mantel-Haenszel test and Mantel-Haenszel OR/RR, introduction to McNemar’s Test

Lab Objectives

After today’s lab you should be able to:

  1. Distinguish between numeric and character variables in SAS.
  2. Use PROC FREQ to generate stratified 2x2 tables.
  3. Use PROC FREQ to generate Mantel-Haenszel statistics and the Mantel-Haenszel summary OR/RR and Breslow-Day test of Homogeneity.
  4. Interpret Mantel-Haenszel results.
  5. Practice using the RESULTS browser (left hand side of screen) to scroll through output.
  6. Input paired data directly into SAS.
  7. Use PROC FREQ to generate McNemar’s statistic for paired or matched data.
  8. Change grouped data into a dataset that contains 1 observation for each individual.
  9. Use a do-loop.
  10. Use an if-then-do loop.
  11. Use nested loops.
  12. Use the output statement to add observations to a dataset that you are modifying.
  13. See the connection between CMH and McNemar’s tests.

LAB EXERCISE STEPS:

Follow along with the computer in front…

  1. Double-click on the SAS icon to open SAS.
  1. Goto the class website: --> Lab 2 Data

Highlight the following data with your mouse; copy with ctrl C:

A 1 1 19

A 1 0 89

A 0 1 314

A 0 0 511

B 1 1 8

B 1 0 17

B 0 1 208

B 0 0 352

C 1 1 391

C 1 0 202

C 0 1 205

C 0 0 120

D 1 1 248

D 1 0 127

D 0 1 265

D 0 0 142

E 1 1 289

E 1 0 104

E 0 1 147

E 0 0 44

F 1 1 321

F 1 0 20

F 0 1 347

F 0 0 26

3. Return to SAS. Use ctrl V to paste the data into the SAS editor window. Then add code accordingly to recreate the code below (and enter data into a SAS dataset):

data admissions;

input program $ IsFemale Denied Count;

datalines;

A 1 1 19

A 1 0 89

A 0 1 314

A 0 0 511

B 1 1 8

B 1 0 17

B 0 1 208

B 0 0 352

C 1 1 391

C 1 0 202

C 0 1 205

C 0 0 120

D 1 1 248

D 1 0 127

D 0 1 265

D 0 0 142

E 1 1 289

E 1 0 104

E 0 1 147

E 0 0 44

F 1 1 321

F 1 0 20

F 0 1 347

F 0 0 26

;

run;

4.Using the Explorer Browser on the left hand side of your screen, double check that a dataset “admissions” has been created in the work library.

5.Test for the crude association between gender and admissions using PROC FREQ.

/**IS gender related to denial of graduate admissions at Berkeley?**/

procfreqdata=admissions order=data;

tables isfemale*denied /chisqnocolnopercentnorowmeasuresexpected ;

weight count;

run;

Table of IsFemale by Denied

IsFemale Denied

Frequency‚

Expected ‚ 1‚ 0‚ Total

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

1 ‚ 1276 ‚ 559 ‚ 1835

‚ 1122.3 ‚ 712.71 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

0 ‚ 1486 ‚ 1195 ‚ 2681

‚ 1639.7 ‚ 1041.3 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 2762 1754 4516

Statistics for Table of IsFemale by Denied

Statistic DF Value Prob

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Chi-Square 1 73.8695 <.0001

Likelihood Ratio Chi-Square 1 75.0511 <.0001

Continuity Adj. Chi-Square 1 73.3243 <.0001

Mantel-Haenszel Chi-Square 1 73.8531 <.0001

Phi Coefficient 0.1279

Contingency Coefficient 0.1269

Cramer's V 0.1279

Fisher's Exact Test

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Cell (1,1) Frequency (F) 1276

Left-sided Pr <= F 1.0000

Right-sided Pr >= F 4.579E-22

Table Probability (P) 3.846E-22

Two-sided Pr <= P 8.408E-22

Estimates of the Relative Risk (Row1/Row2)

Type of Study Value 95% Confidence Limits

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Case-Control (Odds Ratio) 1.8356 1.6196 2.0805

Cohort (Col1 Risk) 1.2546 1.1988 1.3130

Cohort (Col2 Risk) 0.6834 0.6303 0.7411

Sample Size = 4516


6. Could program be a confounder of the relationship between gender and denial of admissions?

Is program related to gender?

Is program related to admissions rates?

procfreqdata=admissions order=data;

tables program*isfemale /chisqnocolnopercentnorowmeasures ;

weight count;

run;

Table of program by IsFemale

program IsFemale

Frequency‚

Row Pct ‚ 1‚ 0‚ Total

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

A ‚ 108 ‚ 825 ‚ 933

‚ 11.58 ‚ 88.42 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

B ‚ 25 ‚ 560 ‚ 585

‚ 4.27 ‚ 95.73 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

C ‚ 593 ‚ 325 ‚ 918

‚ 64.60 ‚ 35.40 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

D ‚ 375 ‚ 407 ‚ 782

‚ 47.95 ‚ 52.05 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

E ‚ 393 ‚ 191 ‚ 584

‚ 67.29 ‚ 32.71 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

F ‚ 341 ‚ 373 ‚ 714

‚ 47.76 ‚ 52.24 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 1835 2681 4516

Statistics for Table of program by IsFemale

Statistic DF Value Prob

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Chi-Square 5 1070.2064 <.0001

procfreqdata=admissions order=data;

tables program*denied /chisqnocolnopercentnorowmeasures ;

weight count;

run;

The FREQ Procedure

Table of program by Denied

program Denied

Frequency‚

Row Pct ‚ 1‚ 0‚ Total

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

A ‚ 333 ‚ 600 ‚ 933

‚ 35.69 ‚ 64.31 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

B ‚ 216 ‚ 369 ‚ 585

‚ 36.92 ‚ 63.08 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

C ‚ 596 ‚ 322 ‚ 918

‚ 64.92 ‚ 35.08 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

D ‚ 513 ‚ 269 ‚ 782

‚ 65.60 ‚ 34.40 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

E ‚ 436 ‚ 148 ‚ 584

‚ 74.66 ‚ 25.34 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

F ‚ 668 ‚ 46 ‚ 714

‚ 93.56 ‚ 6.44 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 2762 1754 4516

Statistics for Table of program by Denied

Statistic DF Value Prob

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Chi-Square 5 771.6742 <.0001

So, program is strongly related to both gender and admissions.
7. Now, stratify on program…

/**IS gender related to denial of graduate admissions at Berkeley after adjusting for program?**/

procfreqdata=admissions order=data;

tables program*isfemale*denied /cmhnocolnopercentnorowmeasures;

weight count;

run;

The FREQ Procedure

Table 1 of IsFemale by Denied

Controlling for program=A

IsFemale Denied

Frequency‚ 1‚ 0‚ Total

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

1 ‚ 19 ‚ 89 ‚ 108

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

0 ‚ 314 ‚ 511 ‚ 825

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 333 600 933

Type of Study Value 95% Confidence Limits

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Case-Control (Odds Ratio) 0.3474 0.2076 0.5814

Cohort (Col1 Risk) 0.4622 0.3045 0.7016

Cohort (Col2 Risk) 1.3305 1.2011 1.4737

Sample Size = 933

Controlling for program=B

IsFemale Denied

Frequency‚ 1‚ 0‚ Total

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

1 ‚ 8 ‚ 17 ‚ 25

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

0 ‚ 208 ‚ 352 ‚ 560

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 216 369 585

Estimates of the Relative Risk (Row1/Row2)

Type of Study Value 95% Confidence Limits

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Case-Control (Odds Ratio) 0.7964 0.3378 1.8775

Cohort (Col1 Risk) 0.8615 0.4817 1.5410

Cohort (Col2 Risk) 1.0818 0.8206 1.4262

Sample Size = 585

Controlling for program=C

IsFemale Denied

Frequency‚ 1‚ 0‚ Total

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

1 ‚ 391 ‚ 202 ‚ 593

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

0 ‚ 205 ‚ 120 ‚ 325

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 596 322 918

Type of Study Value 95% Confidence Limits

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Case-Control (Odds Ratio) 1.1331 0.8545 1.5024

Cohort (Col1 Risk) 1.0453 0.9446 1.1568

Cohort (Col2 Risk) 0.9226 0.7699 1.1055

Sample Size = 918

ETC...

Cochran-Mantel-Haenszel Statistics (Based on Table Scores)

Statistic Alternative Hypothesis DF Value Prob

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

1 Nonzero Correlation 1 1.4972 0.2211

2 Row Mean Scores Differ 1 1.4972 0.2211

3 General Association 1 1.4972 0.2211

Estimates of the Common Relative Risk (Row1/Row2)

Type of Study Method Value 95% Confidence Limits

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Case-Control Mantel-Haenszel 0.9049 0.7717 1.0612

(Odds Ratio) Logit 0.9284 0.7894 1.0918

Cohort Mantel-Haenszel 0.9733 0.9312 1.0173

(Col1 Risk) Logit 1.0043 0.9728 1.0368

Cohort Mantel-Haenszel 1.0581 0.9695 1.1548

(Col2 Risk) Logit 1.1558 1.0727 1.2454

Breslow-Day Test for

Homogeneity of the Odds Ratios

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Chi-Square 18.5989

DF 5

Pr > ChiSq 0.0023

Total Sample Size = 4516

8. Just to show the parallels (and as a preview of what’s to come), let’s also run a multivariate regression (logistic regression) with and without adjustment for confounding by program to see what effect that has on the point estimate for gender. Don’t worry too much about the coding right now—we’ll get back to this later in the term.

This is a simple logistic model with gender as the only predictor:

proclogisticdata=admissions;

model denied (event="1") = isfemale;

weight count;

run;

Here is the parameter estimate (beta coefficient) for gender:

The LOGISTIC Procedure

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 0.2179 0.0389 31.4605 <.0001

IsFemale 1 0.6073 0.0639 90.3529 <.0001

To adjust for confounding by program, simply add program to the model. Now we have a logistic model with two predictors, gender and program. The resulting beta coefficients are each “adjusted for” each other:

proclogisticdata=admissions;

class program;

model denied (event="1") = isfemale program;

weight count;

run;

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 0.6893 0.0513 180.2711 <.0001

IsFemale 1 -0.0990 0.0809 1.4986 0.2209

9. We can also check for an interaction (effect modification) between gender and program by adding an interaction term into the model:

proclogisticdata=admissions;

class program;

model denied (event="1") = isfemale program isfemale*program;

weight count;

run;

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 0.6573 0.0547 144.4267 <.0001

IsFemale 1 -0.1858 0.1107 2.8175 0.0932

program A 1 -1.1443 0.0801 203.9757 <.0001

program B 1 -1.1834 0.0899 173.0928 <.0001

program C 1 -0.1218 0.1086 1.2567 0.2623

program D 1 -0.0334 0.1010 0.1092 0.7410

program E 1 0.5490 0.1506 13.2884 0.0003

IsFemale*program A 1 -0.8714 0.2414 13.0361 0.0003

IsFemale*program B 1 -0.0419 0.3740 0.0126 0.9108

IsFemale*program C 1 0.3107 0.1614 3.7039 0.0543

IsFemale*program D 1 0.2311 0.1655 1.9496 0.1626

IsFemale*program E 1 0.00157 0.2016 0.0001 0.9938
PartII: McNemar’s Test is used when you have paired or matched data, such as in a matched case-control study. The idea is: you analyze the data by pairs rather than by individuals. Only pairs that are discordant (i.e., cases and controls differ on exposure) provide useful information. Under the null hypothesis, there should be just as many discordant pairs where the case is exposed as pairs where the control is exposed. (We will discuss McNemar’s formally in class next Monday).

10. Input the following matched dataset* and run McNemar’s Test, using the option “agree” in PROC FREQ.

data the_pill;

input caseuse $ contruse $ n;

datalines;

Y Y 10

Y N 57

N Y 13

N N 95

;

procfreqdata=the_pill order=data;

tables caseuse*contruse /agree;

weight n;

run;

caseuse contruse

Frequency‚

Percent ‚

Row Pct ‚

Col Pct ‚Y ‚N ‚ Total

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Y ‚ 10 ‚ 57 ‚ 67

‚ 5.71 ‚ 32.57 ‚ 38.29

‚ 14.93 ‚ 85.07 ‚

‚ 43.48 ‚ 37.50 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

N ‚ 13 ‚ 95 ‚ 108

‚ 7.43 ‚ 54.29 ‚ 61.71

‚ 12.04 ‚ 87.96 ‚

‚ 56.52 ‚ 62.50 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 23 152 175

13.14 86.86 100.00

There are 57 discordant pairs where the case was exposed; and only 13 where the control was exposed. Under the null hypothesis, there should be 70/2=35 pairs of each. Is 57/13 significantly different from 35/35? YES.

Statistics for Table of caseuse by contruse

McNemar's Test

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Statistic (S) 27.6571

DF 1

Pr > S <.0001

*Example from “A Handbook of Statistical Analyses using SAS” Der and Everitt.

11. Next, we will explore the connection between Cochran-Mantel-Haenszel and McNemar’s Tests. To turn the paired, grouped data into a dataset that contains 1 observation for each individual, use the following code:

data the_pill2;

set the_pill;

if caseuse='Y' and contruse='Y'thendo;

do id=1to n;

isCase=1;

UsePill=1;

output;

isCase=0;

UsePill=1;

output;

end;

end;

if caseuse='Y' and contruse='N'thendo;

do id=101to (n+100);

isCase=1;

UsePill=1;

output;

isCase=0;

UsePill=0;

output;

end;

end;

if caseuse='N' and contruse='Y'thendo;

do id=201to (n+200);

isCase=1;

UsePill=0;

output;

isCase=0;

UsePill=1;

output;

end;

end;

if caseuse='N' and contruse='N'thendo;

do id=301to (n+300);

isCase=1;

UsePill=0;

output;

isCase=0;

UsePill=0;

output;

end;

end;

run;

  1. To better understand what we’ve done here, look at the dataset. Use the Explorer Browser. Double-click on the Work Library. Double click on the_pill2 to open the dataset in ViewTable mode.
  1. Type in the following code to run the CMH test on the stratified data, where each pair forms its own unique stratum.

procfreqdata=the_pill2;

tables id*isCase*UsePill/nocolnorownopercentcmhmeasures;

run;

  1. In the Results Browser (left-hand window of your screen), scroll down to any of the 175 2x2 tables. Click on anyone to see what it looks like. Then Scroll to the very bottom to the “Summary for isCase*UsePill” to see the CMH statistics.

For example, table 98:

Table 98 of isCase by UsePill

Controlling for id=318

isCase UsePill

Frequency‚ 0‚ 1‚ Total

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

0 ‚ 1 ‚ 0 ‚ 1

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

1 ‚ 1 ‚ 0 ‚ 1

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 2 0 2

  1. Compare the CMH chi-square statistic to the McNemar’s chi-square statistic (from step10):

Cochran-Mantel-Haenszel Statistics (Based on Table Scores)

Statistic Alternative Hypothesis DF Value Prob

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

1 Nonzero Correlation 1 27.6571 <.0001

2 Row Mean Scores Differ 1 27.6571 <.0001

3 General Association 1 27.6571 <.0001

1