2

REGWQ and Power: A Mini Monte Carlo

Here is the SAS code I used to generate the data for your REGWQ assignment:

group=1;gender=1; image=1; DO S=1 TO 96; pulse=int(476 + 600*NORMAL(0))/100;

OUTPUT; FILE "C:\D\SimData\%name.dat";

put group gender image pulse; END;

group=2;gender=1; image=2; DO S=1 TO 96; pulse=int(300 + 524*NORMAL(0))/100;

OUTPUT; FILE "C:\D\SimData\%name.dat";

put group gender image pulse; END;

group=3;gender=2; image=1; DO S=1 TO 96; pulse=int(225 + 468*NORMAL(0))/100;

OUTPUT; FILE "C:\D\SimData\%name.dat";

put group gender image pulse; END;

group=4;gender=2; image=2; DO S=1 TO 96; pulse=int(257 + 437*NORMAL(0))/100;

From the code, you can see that in the population from which your data were randomly sampled, the parameters were:

Population / M / SD
Male, sexual / 4.76 / 6.00
Male, emotional / 3.00 / 5.24
Female, emotional / 2.57 / 4.37
Female, sexual / 2.25 / 4.68

The pooled within-group variance is 26.11, the pooled standard deviation 5.11. The pairwise differences in means range from d = .063 to 0.491.

Cohen’s f (effect size) is computed as . For these groups, . This can also be computed as , where the numerator is the standard deviation of the population means (SS divided by N, not by N - 1) and the denominator is the within-group standard deviation. For our groups that is .9698/5.11 = .190.

Cohen considered an f of .10 to be a small effect, .25 a medium effect, and .40 a large effect.

For our effect size, G*Power tells us power is 89% for the ANOVA. In a previous semester, 15 of 17 students obtained significant results, so observed power = 15/17 = 88%. Two students made Type II errors.

F tests - ANOVA: Fixed effects, omnibus, one-way

Analysis: Post hoc: Compute achieved power

Input: Effect size f = 0.19

α err prob = 0.05

Total sample size = 384

Number of groups = 4

Output: Noncentrality parameter λ = 13.8624000

Critical F = 2.6283946

Numerator df = 3

Denominator df = 380

Power (1-β err prob) = 0.8895040

Each student conducted six pairwise comparisons. In every case the two population means did differ. Across 17 students, there were 6 x 17 = 102 pairwise comparisons. Of those, 35 were significant. Observed power was 35/102 = 34%. That is, the per comparison probability of a Type II error was 66%. As I and others have often noted, these procedures to cap familywise alpha greatly increase beta. It would have been even worse had we used a more conservative procedure, such a Tukey’s HSD.

If you go back over the terse reports students provided for the results of the pairwise comparisons, you will notice that there was considerable variability on the pattern of results obtained. Psychologists often write lots of garbage in the discussion sections of their manuscripts, attempting to explain why their pattern of results differs from that of others who have reported results on the same or similar research. This is not only a waste of time, but counter-productive. The simple truth is, as you have now observed, such variability in results is the result of Type II errors, plain and simple.