STAT3900/4950SAS for Basic Categorical Data AnalysisDr. Fan
Reading assignment: SAS textbook, Sections 3.A – 3.I, 3.L, 3.M, 3.Q
Testing a hypothesized distribution (One-way Chi-square test)
Example: Children’s Hair Color Data
Hair Color / Fair / Red / Medium / Dark / BlackFrequency / 76 / 19 / 83 / 65 / 3
% / 30.89 / 7.72 / 33.74 / 26.42 / 1.22
Hypothesized % / 30 / 12 / 30 / 25 / 3
Question:
Conduct a test for testing the hypothesized distribution of children’s hair color.
SAS Syntax:
PROC FREQ DATA=dataname ORDER=DATA;
WEIGHT frequency_var; /* this line must be omitted if no frequency is given */
TABLES var_of_test / TESTP=(hypothesized %);
RUN;
SAS Code:
data haircolor;
input color $ count@@;
datalines;
f 76 r 19 m 83 d 65 bk 3
;
run;
procprintdata=haircolor;
run;
procfreqdata=haircolor order=data;
title'Hair Color of Children';
weight count;
tables color/ testp=(301230253);
run;
Test for the Independence between 2 Categorical Variables
Example:1991 General Social Survey
Frequency / Party IdentificationDemocrat / Independent / Republican
Race / White / 341 / 105 / 405
Black / 103 / 15 / 11
Question: Is the distribution of Party Identification of white people different to that of black people?
SAS Syntax:
PROC FREQ DATA=dataname;
WEIGHT frequency_var; /* this line must be omitted if no frequency is given */
TABLES var1*var2/ options;
RUN;
SAS Code:
data survey;
input race $ party $ count @@;
datalines;
w d 341 w i 105 w r 405 b d 103 b i 15 b r 11
;
run;
procfreqdata=survey;
title'independence between race and party';
weight count;
table race*party;
table race*party/nocolnorow nopct;
table race*party/ all;
table race*party/ chisq;
table race*party/ exact;/* Fish test*/
run;
Test for Agreement
Example: Rating for Prime Minister
Second SurveyFirst Survey / Approve / Disapprove
Approve / 794 / 150
Disapprove / 86 / 570
Question: Does the results from the 1st survey agree with the results from the 2nd survey?
SAS Syntax:
PROC FREQ DATA=dataname;
EXACT MACNEM; /* ask for the exact p-value */
WEIGHT frequency_var; /* this line must be omitted if no frequency variable is given */
TABLE var1*var2/ AGREE;
RUN;
SAS Code:
procformat;
value $opinion 'a'='approval'
'd'='disapproval';
data prime_minister;
input first $ second $ count@@;
format first $opinion. second $opinion.;
datalines;
a a 794 a d 150
d a 86 d d 570
;
run;
/* McNemars Test with exact p-value and Kappa coefficient */
procfreqdata=prime_minister;
exactmcnem;
weight count;
table first*second/agree;
run;
Meta (or Stratified) Analysis
Example:Respiratory Improvement
Center / Treatment / Yes / No / Total1 / Test / 29 / 16 / 45
1 / Placebo / 14 / 31 / 45
Total / 43 / 47 / 90
2 / Test / 37 / 9 / 45
2 / Placebo / 24 / 21 / 45
Total / 61 / 29 / 90
Question: Does the effect of treatment on the respiratory improvement differ for the 2 centers?
SAS Syntax:
PROC FREQ ORDER=DATA DATA=dataname;
WEIGHT frequency_var; /* this line must be omitted if no frequency variable is given */
TABLES var1*var2*var3/ ALL; /* strata*treatment*response */
RUN;
SAS Code:
data respire;
input center trtmnt $ response $ count @@;
datalines;
1 testy 29 1 testn 16
1 placeboy 14 1 placebon 31
2 testy 37 2 testn 8
2 placeboy 24 2 placebon 21
;
run;
procfreqorder=data data=respire;
weight count;
tables center*trtmnt*response/all;
run;
1