******* CHISQ.sas *******;
optionsformdlim='-'pageno=min nodate;
title'Chi-Square via PROC FREQ on SAS';
title2'Escalator Use and Obesity';
procformat; value wgt1='Obese'2='Overweight'3='Normal';
valuedir1='Ascending'2='Descending'; valuedev1='stairs'2='escalate'; run;
data Lotus; input weight direct device freq;
format weight wgt. direct dir. device dev. ;
cards;
1 1 1 10
1 1 2 205
1 2 1 14
1 2 2 81
2 1 1 22
2 1 2 538
2 2 1 143
2 2 2 372
3 1 1 82
3 1 2 998
3 2 1 174
3 2 2 578
procfreq; tables direct*device weight*direct*device / chisqrelrisknopercentnocol;
weightfreq; run;
Chi-Square via PROC FREQ on SASEscalator Use and Obesity
The FREQ Procedure
FrequencyRow Pct
/ Tableofdirectbydevice
direct / device
stairs / escalate / Total
Ascending / 114
6.15
/ 1741
93.85
/ 1855
Descending / 331
24.30
/ 1031
75.70
/ 1362
Total / 445
/ 2772
/ 3217
Statistics for Table of direct by device
Statistic / DF / Value / ProbChi-Square / 1 / 217.2223 / <.0001
Phi Coefficient / -0.2599
The phi coefficient is the Pearson r computed between two dichotomous variables. It can be used as a measure of the strength of the association between two dichotomous variables. Alternatively, one can use an odds ratio. The correspondence between phi and the odds ratio is not good when the marginal counts are not uniform.
Contingency Table Analysis
Size of effect / / odds ratio*small / .1 / 1.49
medium / .3 / 3.45
large / .5 / 9
*For a 2 x 2 table with both marginals distributed uniformly.
Estimates of the Relative Risk (Row1/Row2)Type of Study / Value / 95%ConfidenceLimits
Case-Control (Odds Ratio) / 0.2040 / 0.1626 / 0.2558
Odds ratio are usually digested better if presented as greater than one rather than as less than one. The inverted odds ratio here is 4.90 with a CI than runs from 3.91 to 6.15.
Individuals were significantly more likely to use the stairs rather than the escalator when descending (24.30%) than when ascending (6.15%), 2(1, N = 3,217) = 217.22, p < .001, OR = 4.90, 95% CI [3.91, 6.15].
Sample Size = 3217
FrequencyRow Pct
/ Table1ofdirectbydevice
Controllingforweight=Obese
direct / device
stairs / escalate / Total
Ascending / 10
4.65
/ 205
95.35
/ 215
Descending / 14
14.74
/ 81
85.26
/ 95
Total / 24
/ 286
/ 310
Statistics for Table 1 of direct by device
Controlling for weight=Obese
Chi-Square / 1 / 9.3833 / 0.0022
Phi Coefficient / -0.1740
Estimates of the Relative Risk (Row1/Row2)
Type of Study / Value / 95%ConfidenceLimits
Case-Control (Odds Ratio) / 0.2822 / 0.1205 / 0.6612
Among the obese, the odds of taking the stairs are 1/.2822 = 3.54 higher when descending than when ascending.
Sample Size = 310
FrequencyRow Pct
/ Table2ofdirectbydevice
Controllingforweight=Overweight
direct / device
stairs / escalate / Total
Ascending / 22
3.93
/ 538
96.07
/ 560
Descending / 143
27.77
/ 372
72.23
/ 515
Total / 165
/ 910
/ 1075
Statistics for Table 2 of direct by device
Controlling for weight=Overweight
Chi-Square / 1 / 117.3365 / <.0001
Phi Coefficient / -0.3304
Estimates of the Relative Risk (Row1/Row2)
Type of Study / Value / 95%ConfidenceLimits
Case-Control (Odds Ratio) / 0.1064 / 0.0666 / 0.1698
Among the overweight, the odds ratio is 9.40.
Sample Size = 1075
FrequencyRow Pct
/ Table3ofdirectbydevice
Controllingforweight=Normal
direct / device
stairs / escalate / Total
Ascending / 82
7.59
/ 998
92.41
/ 1080
Descending / 174
23.14
/ 578
76.86
/ 752
Total / 256
/ 1576
/ 1832
Statistics for Table 3 of direct by device
Controlling for weight=Normal
Chi-Square / 1 / 89.1234 / <.0001
Phi Coefficient / -0.2206
Estimates of the Relative Risk (Row1/Row2)
Type of Study / Value / 95%ConfidenceLimits
Case-Control (Odds Ratio) / 0.2729 / 0.2059 / 0.3618
Among those normal in weight, the odds ratio is 3.66.
******************************** Create a new format ***************************;
data Format;
title 'Contingency Table Analysis After Categorizing a Continuous Variable.'; run;
procformat;
value add 1='low add' 2='medium add'3 = 'high add';
value rep 0='promoted' 1='repeated';
******************************* Read in Howell.dat and do CTA ********************;
data Sol; infile'C:\Users\Vati\Documents\StatData\howell.dat';
inputaddsc sex repeat iqenglengggpasocprob dropout;
if0addsc <= 47thenadd_cat = 1;
elseif47addsc <= 56thenadd_cat = 2;
elseifaddsc56thenadd_cat = 3;
formatadd_catadd. repeat rep. ;
procfreq; tablesadd_cat*repeat / expectednocolnopercentchisq; run;
Contingency Table Analysis After Categorizing a Continuous Variable.The FREQ Procedure
FrequencyExpected
Row Pct
/ Tableofadd_catbyrepeat
add_cat / repeat
promoted / repeated / Total
low add / 29
25.045
100.00
/ 0
3.9545
0.00
/ 29
medium add / 27
25.045
93.10
/ 2
3.9545
6.90
/ 29
high add / 20
25.909
66.67
/ 10
4.0909
33.33
/ 30
Total / 76
/ 12
/ 88
Statistics for Table of add_cat by repeat
Statistic / DF / Value / ProbChi-Square / 2 / 15.5806 / 0.0004
Phi Coefficient / 0.4208
Contingency Coefficient / 0.3878
Cramer's V / 0.4208
WARNING: 50% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Phi here is actually Cramer’s phi, a strength of association measure that can be computed for tables that have more than two rows and/or more than two columns. It can be interpreted using the same benchmarks as for phi.
Notice that the percentage of students repeating a grade does not differ much between the low add and the medium add groups. In fact, when I compared these two groups I found that they did not differ significantly. Accordingly, I decided to combine them and then compare the combined groups to the high add group.
****************** Compare High Group with Other Groups Combined *****************;
DataLoMed; set Sol;
Ifadd_cat3thenadd_cat = 0;
procfreq; tablesadd_cat*repeat / relrisknocolnopercentchisq; run;
FrequencyRow Pct
/ Tableofadd_catbyrepeat
add_cat / repeat
promoted / repeated / Total
low/medium add / 56
96.55
/ 2
3.45
/ 58
high add / 20
66.67
/ 10
33.33
/ 30
Total / 76
/ 12
/ 88
Statistics for Table of add_cat by repeat
Statistic / DF / Value / ProbChi-Square / 1 / 14.9950 / 0.0001
Estimates of the Relative Risk (Row1/Row2)
Type of Study / Value / 95%ConfidenceLimits
Case-Control (Odds Ratio) / 14.0000 / 2.8217 / 69.4627
The odds of repeating a grade are 14 times higher for those students high in ADD than for those not high in ADD.