******* CHISQ.sas *******;

optionsformdlim='-'pageno=min nodate;

title'Chi-Square via PROC FREQ on SAS';

title2'Escalator Use and Obesity';

procformat; value wgt1='Obese'2='Overweight'3='Normal';

valuedir1='Ascending'2='Descending'; valuedev1='stairs'2='escalate'; run;

data Lotus; input weight direct device freq;

format weight wgt. direct dir. device dev. ;

cards;

1 1 1 10

1 1 2 205

1 2 1 14

1 2 2 81

2 1 1 22

2 1 2 538

2 2 1 143

2 2 2 372

3 1 1 82

3 1 2 998

3 2 1 174

3 2 2 578

procfreq; tables direct*device weight*direct*device / chisqrelrisknopercentnocol;

weightfreq; run;

Chi-Square via PROC FREQ on SAS
Escalator Use and Obesity

The FREQ Procedure

Frequency
Row Pct
/ Tableofdirectbydevice
direct / device
stairs / escalate / Total
Ascending / 114
6.15
/ 1741
93.85
/ 1855
Descending / 331
24.30
/ 1031
75.70
/ 1362
Total / 445
/ 2772
/ 3217

Statistics for Table of direct by device

Statistic / DF / Value / Prob
Chi-Square / 1 / 217.2223 / <.0001
Phi Coefficient / -0.2599

The phi coefficient is the Pearson r computed between two dichotomous variables. It can be used as a measure of the strength of the association between two dichotomous variables. Alternatively, one can use an odds ratio. The correspondence between phi and the odds ratio is not good when the marginal counts are not uniform.

Contingency Table Analysis

Size of effect /  / odds ratio*
small / .1 / 1.49
medium / .3 / 3.45
large / .5 / 9

*For a 2 x 2 table with both marginals distributed uniformly.

Estimates of the Relative Risk (Row1/Row2)
Type of Study / Value / 95%ConfidenceLimits
Case-Control (Odds Ratio) / 0.2040 / 0.1626 / 0.2558

Odds ratio are usually digested better if presented as greater than one rather than as less than one. The inverted odds ratio here is 4.90 with a CI than runs from 3.91 to 6.15.

Individuals were significantly more likely to use the stairs rather than the escalator when descending (24.30%) than when ascending (6.15%), 2(1, N = 3,217) = 217.22, p < .001, OR = 4.90, 95% CI [3.91, 6.15].

Sample Size = 3217

Frequency
Row Pct
/ Table1ofdirectbydevice
Controllingforweight=Obese
direct / device
stairs / escalate / Total
Ascending / 10
4.65
/ 205
95.35
/ 215
Descending / 14
14.74
/ 81
85.26
/ 95
Total / 24
/ 286
/ 310

Statistics for Table 1 of direct by device
Controlling for weight=Obese

Statistic / DF / Value / Prob
Chi-Square / 1 / 9.3833 / 0.0022
Phi Coefficient / -0.1740
Estimates of the Relative Risk (Row1/Row2)
Type of Study / Value / 95%ConfidenceLimits
Case-Control (Odds Ratio) / 0.2822 / 0.1205 / 0.6612

Among the obese, the odds of taking the stairs are 1/.2822 = 3.54 higher when descending than when ascending.

Sample Size = 310

Frequency
Row Pct
/ Table2ofdirectbydevice
Controllingforweight=Overweight
direct / device
stairs / escalate / Total
Ascending / 22
3.93
/ 538
96.07
/ 560
Descending / 143
27.77
/ 372
72.23
/ 515
Total / 165
/ 910
/ 1075

Statistics for Table 2 of direct by device
Controlling for weight=Overweight

Statistic / DF / Value / Prob
Chi-Square / 1 / 117.3365 / <.0001
Phi Coefficient / -0.3304
Estimates of the Relative Risk (Row1/Row2)
Type of Study / Value / 95%ConfidenceLimits
Case-Control (Odds Ratio) / 0.1064 / 0.0666 / 0.1698

Among the overweight, the odds ratio is 9.40.

Sample Size = 1075

Frequency
Row Pct
/ Table3ofdirectbydevice
Controllingforweight=Normal
direct / device
stairs / escalate / Total
Ascending / 82
7.59
/ 998
92.41
/ 1080
Descending / 174
23.14
/ 578
76.86
/ 752
Total / 256
/ 1576
/ 1832

Statistics for Table 3 of direct by device
Controlling for weight=Normal

Statistic / DF / Value / Prob
Chi-Square / 1 / 89.1234 / <.0001
Phi Coefficient / -0.2206
Estimates of the Relative Risk (Row1/Row2)
Type of Study / Value / 95%ConfidenceLimits
Case-Control (Odds Ratio) / 0.2729 / 0.2059 / 0.3618

Among those normal in weight, the odds ratio is 3.66.

******************************** Create a new format ***************************;

data Format;

title 'Contingency Table Analysis After Categorizing a Continuous Variable.'; run;

procformat;

value add 1='low add' 2='medium add'3 = 'high add';

value rep 0='promoted' 1='repeated';

******************************* Read in Howell.dat and do CTA ********************;

data Sol; infile'C:\Users\Vati\Documents\StatData\howell.dat';

inputaddsc sex repeat iqenglengggpasocprob dropout;

if0addsc <= 47thenadd_cat = 1;

elseif47addsc <= 56thenadd_cat = 2;

elseifaddsc56thenadd_cat = 3;

formatadd_catadd. repeat rep. ;

procfreq; tablesadd_cat*repeat / expectednocolnopercentchisq; run;

Contingency Table Analysis After Categorizing a Continuous Variable.

The FREQ Procedure

Frequency
Expected
Row Pct
/ Tableofadd_catbyrepeat
add_cat / repeat
promoted / repeated / Total
low add / 29
25.045
100.00
/ 0
3.9545
0.00
/ 29
medium add / 27
25.045
93.10
/ 2
3.9545
6.90
/ 29
high add / 20
25.909
66.67
/ 10
4.0909
33.33
/ 30
Total / 76
/ 12
/ 88

Statistics for Table of add_cat by repeat

Statistic / DF / Value / Prob
Chi-Square / 2 / 15.5806 / 0.0004
Phi Coefficient / 0.4208
Contingency Coefficient / 0.3878
Cramer's V / 0.4208
WARNING: 50% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.

Phi here is actually Cramer’s phi, a strength of association measure that can be computed for tables that have more than two rows and/or more than two columns. It can be interpreted using the same benchmarks as for phi.

Notice that the percentage of students repeating a grade does not differ much between the low add and the medium add groups. In fact, when I compared these two groups I found that they did not differ significantly. Accordingly, I decided to combine them and then compare the combined groups to the high add group.

****************** Compare High Group with Other Groups Combined *****************;

DataLoMed; set Sol;

Ifadd_cat3thenadd_cat = 0;

procfreq; tablesadd_cat*repeat / relrisknocolnopercentchisq; run;

Frequency
Row Pct
/ Tableofadd_catbyrepeat
add_cat / repeat
promoted / repeated / Total
low/medium add / 56
96.55
/ 2
3.45
/ 58
high add / 20
66.67
/ 10
33.33
/ 30
Total / 76
/ 12
/ 88

Statistics for Table of add_cat by repeat

Statistic / DF / Value / Prob
Chi-Square / 1 / 14.9950 / 0.0001
Estimates of the Relative Risk (Row1/Row2)
Type of Study / Value / 95%ConfidenceLimits
Case-Control (Odds Ratio) / 14.0000 / 2.8217 / 69.4627

The odds of repeating a grade are 14 times higher for those students high in ADD than for those not high in ADD.