Chapter 14: Introduction to Factorial Design:

Two-Way Analysis of Variance

  1. Computations for the CompletelyCrossed Factorial Design

The two-way ANOVA allows you to add an important qualifier to your original question. For instance, in the last exercise of the previous chapter,we asked: do blonds have more fun? That question is usually asked with females in mind, but if we added gender as a factor, we could then ask whether blonds having more fun is just as true for men as it is for women (or just as false). To analyze the effects of both gender and hair color in one study requires a two-way ANOVA.

In a one-way ANOVA, variation was split into two pieces: between groups and within groups.

In a two-way ANOVA, the between-groups variation is then split into even more pieces.

Total Variation
Between-Group Variation / Within-Group Variation
/ / / (Error)
Variation due to Factor 1 / Variation due to Factor 2 / Variation due to Interaction ofthe factors

You can see the Between-Groups variation splitting into three components: Factor 1, Factor 2, and their Interaction. This same breakdown is used in the summary table for a two-way ANOVA. The Between-Groups variation is split into three pieces, and these three pieces now add up to the Between-Groups subtotal (as long as the design is balanced—that is, there are the same number of participants in every cell). Also note that some spaces inthe summary table do not need to be filled in, because these values aren’t used;for the Between-Groups row, we do not need MSBet, F, or p. Instead, we are looking for only three final F values: one to determine if there are significant differences overall(a main effect) for Factor 1 (e.g., hair color), one for differences in Factor 2 (e.g.,main effect of gender), and one to test the interaction of the twofactors.

Source / SS / df / MS / F / p
Between Groups / ------/ ------/ ------
Factor 1
Factor 2
Interaction
Within Groups / ------/ ------
Total / ------/ ------/ ------

Let’s continue with thehair-colorexample from the end of the previous chapter.

Redhead / Brunette / Blonde / Row Means
Male / 3
2
M = 2.50
s = 0.71 / 7
5
M= 6.00
s = 1.41 / 3
5
M = 4.00
s = 1.41 / 4.167
Female / 6
8
M = 7.00
s = 1.41 / 5
3
M = 4.00
s = 1.41 / 10
7
M = 8.50
s = 2.12 / 6.50
Column Means / 4.75 / 5.00 / 6.25 / Grand Mean=
5.333

For this example, we placed hair color in the columns and gender in the rows, so r = 2 and c = 3. Given these values, and that n = 2, so NT = nrc = 2*2*3 = 12, we can now determine our degrees of freedom.

STEP 1: Calculate Degrees of Freedom

A logical way to split up the degrees of freedom in the two-way ANOVA is the scheme we used to divide up the variation:

df total
= NT–1
= 12–1
= 11
df between cells
= (rc – 1)
= (3*2 – 1)
= 5 / df withincells
= NT– rc
= 12 – 3*2
= 6
df haircolor
(c – 1)
= 3 – 1
=2 / df gender
(r – 1)
= 2 – 1
=1 / df interaction
(c – 1)(r – 1)
=2*1
=2

STEP 2: Calculate the Sums of Squares

Sum of Squares Total:

Sum of Squares Between Groups:

Sum of Squares Within Groups:

Sum of Squares for Factor 1 (Hair Color):

Sum of Squares for Factor 2 (Gender):

Sum of Squares for the Interaction:

Source / SS / df / MS / F / p
Between Groups / 49.667 / 5 / ------/ ------/ ------
Hair Color / 5.167 / 2
Gender / 16.333 / 1
Interaction / 28.167 / 2
Within Groups / 13.00 / 6 / ------/ ------
Total / 62.667 / 11 / ------/ ------/ ------

STEP 3: Calculate the Mean Square Values

Nowthat we have found the SSs, we just divide each SS component by its appropriate degrees of freedom to get the MS values (see table at the end of this section).

STEP 4: Calculate the FRatios

Thenext step is to divide each component of the MSBet by the MSW term. So, for the main effect of Factor 1 (Hair Color), F = 34.08/1.58 = 21.53; for Factor 2 (Gender), F = 30.08/1.58 = 19.00; and for the Interaction term, F = 1.58/1.58 = 1.00.

STEP 5: Make a statistical decision concerning the null hypothesis for each F ratio

The final step requires looking up the critical F for each effect. Because the table makes it easy, we will look up the critical values for both the .05 and .01 levels. For the main effect of Hair Color, you would look up a critical F based on the degrees of freedom for the Hair Color factor (df1 = 2), and the degrees of freedom for its error term (dfW = 6), so F.05 (2, 6) = 5.14, and F.01 (2, 6) = 10.92, but, of course,F = 1.19 is not even close to significance at the .05 level. The critical values for Gender are:F.05(1,6) = 5.99, and F.01 (1, 6) = 13.74, so the main effect of gender,F = 7.54,is statistically significant at the .05, but not the .01 level. Finally, for the interaction, the df are the same as for the hair color main effect, so you can use those critical values; Finter = 6.50, so, like Gender,it is statistically significant at the .05, but not the .01 level.

Source / SS / df / MS / F / p
Between Groups / 49.667 / 5 / 9.933 / ------/ ------
Hair Color / 5.167 / 2 / 2.583 / 1.192 / > .05 (n.s.)
Gender / 16.333 / 1 / 16.333 / 7.538 / < .05
Interaction / 28.167 / 2 / 14.083 / 6.500 / < .05
Within Groups / 13.00 / 6 / 2.167 / ------/ ------
Total / 62.667 / 11 / ------/ ------/ ------
  1. Interpreting Graphs of Cell Means

A graph of the cell means from the hair-color example isshownhere. In this example, the separate lines represent gender, the (mostly) top line (green) is for women, and the (mostly) bottom line (blue) is for men. Hair color is on the horizontal axis, with redheads first (1), then brunettes (2), and then blonds (3) on the right.

Looking at the graphs of cell means can be a quick way to detect interactions. Lines that are mostly, if not entirely, parallel indicate verylittle (and probably not significant) interaction. In this example, however, we did have a significant interaction, so it makes sense that the lines crisscross and are far from parallel. The lines don’t have to cross for the interaction to be large or significant, but lines that slope in opposite ways are more likely to indicate an interesting interaction (and to mess up the main effects)!

  1. Follow-Up Tests for Significant Main Effects

If the interaction is not significant, or even close to significance, it is appropriate to follow up on any main effects that aresignificant. Thefollow-up tests for significant main effects are very similar to the follow-up tests for a one-way ANOVA. Just be aware that the n used in the LSD or HSD formula is the n for a whole column if comparing column means, or the n for a whole row if comparing row means.

Hair Color:The main effect of hair color was not significant in our example, but thisfactor has three levels, so if it weresignificant (and the interaction were not so large), we would be followingit up with Fisher’s LSD test, using MSW from the two-way ANOVA, and therefore using a critical t value based on dfW, also fromthe two-way ANOVA. The n we would use in the formula would be 4, because that is how many scores are averaged together to obtain each hair-color mean.

Gender:The main effect of genderwas significant;however, because this effect has only two levels, we don’t need to follow it up.We know where the entire significant difference islocated: between males and females.(Duh!)

  1. Follow-Up Tests for Significant Interactions: Simple Main Effects

If the interaction is significant, as it was in our example, then it is usually not appropriate to do follow-up tests on the main effects, even the ones that are significant. This is especially true if the lines on the graph of cell means slope in different directions, or there is otherwise a very large difference between the slopes.The most common approach in this case is to test the (notso) Simple Main Effects. This entails looking for differences in one factor at just one specific level (at a time) of the other effect. In our present example, itwould make sense to test the simple main effect of hair color just for males, and then test it againjust for females.

To test the simple main effect for the males, just perform a one-way ANOVA on the hair-color means of the males (the top row of the data table for this example), but use MSW from the two-way ANOVA as your error term (i.e., bottom of your F ratio). Then do the same for the females. If you do that, you’ll find that for the males, F = 6.167/2.167 = 2.85, p > .05, and for the females, F = 10.5/2.167 = 4.85, p < .06. Neither of the simple main effects is significant at the .05 level—a significant interaction is no guarantee that any of them will be—but the simple effect for females came really close. If it were significant, you would then need to follow up that effect, because, for females, you still wouldn’t know just which hair color differed significantly in the fun rating from which others.

Just for the exercise, we calculated LSD for the females: LSD = t.05(6) √(2*2.167/2) = 2.447 * 1.472 = 3.60. Note that the difference between brunettes (M = 4.0) and Blonds (M = 8.5), which is 4.5, is greater than 3.6, so if the LSD test were justified, we could conclude that blond females have more fun than brunette females. Unfortunately, if we are to be very honest with these procedures, we cannot perform follow-up tests on the simple main effect of hair color for females, because it fell just short of significance at the .05 level.Note that in the LSD formula, we based our critical t on 6 df, because MSW has 6 df. Also, n = 2 in this case, because we are really comparing cells here, and there are only two scores in each cell.

There is another way we could legitimately test simple main effects for the hair-color study: we could perform pairwise comparisons between the two genders, separately for each hair color. Because there are only two levels of gender, we can do these comparisons either as t tests or as one-way ANOVAs. Even easier, we can use the LSD value we already calculated to follow up the hair-color simple effect, because we are still using the same MSW, with the same df, and the same n per cell. In this example, the gender difference for redheads is the same as for blonds, 4.5, which is greater than LSD (3.6), so we can conclude that there are significant gender differences in fun ratings for both blonds and redheads, but not for brunettes (the latter gender difference is only 2.0).

  1. MeasuringEffect Sizefor a Two-Way ANOVA

It is easy to get partial eta squared (ηp2) for each effect in a two-way ANOVA by simply plugging in the appropriate F ratios and dfs in the following formula:

There’s no point in getting eta squared for an effect that is very far from significance, so for the hair-color example, we will find (ηp2) for just the Gender and Interaction effects. For Gender:

And for the interaction:

These are very large effect sizes. In fact, partial eta squared is probably inappropriate for this example, because both factors involve preexisting groups; neither factor involves an experimental manipulation or the random assignment of participants. So, ordinary eta squared, which is generally smaller, makes more sense:

For Gender, ηord2 = 16.33/62.67 = .26; and for the interaction, ηord2 = 28.167/62.67 = .45.These effects are still quite large, but a bit more realistic.

Now an exercise for you to try . . .

1. A food researcher measures the preferences of 36 individuals for different types of pizza crust: thin crust, regular crust, and deep dish, as well as the preference for plain cheese or pepperoni. The 36 people are equally divided among the six possible combinations of crust and topping, and then each person is invited to eat as many 1-inch square pieces of pizza as they’d like of the type of pizza they were assigned to. The mean number of pieces of each type of pizza consumed is presented in the table:

Cheese Only / Pepperoni / Row Means
Thin Crust / M = 7.67
SD = 1.21 / M = 3.17
SD = 1.17
Regular Crust / M = 5.83
SD = 1.67 / M = 8.83
SD = 1.17
Deep Dish / M = 4.00
SD = 1.17 / M = 5.83
SD = 1.47
Column Means

Report yourresults in a Summary Table like this one:

Source / SS / df` / MS / F / p
Between Conditions / ------/ ------/ ------
Crust Type
Topping
Crust × Topping Interaction
Within Conditions / ------/ ------
Total / ------/ ------/ ------

2. Draw a graph of the cell means in the previous exercise. Explain any obvious interaction that you see. For any significant findings in the previous exercise, follow up with appropriate post hoc tests, and provide a measure of the effect size.

Answers to Exercises

1. Here, the marginal means have been filled in:

Cheese Only / Pepperoni / Row Means
Thin Crust / M = 7.67
SD = 1.21 / M = 4.17
SD = 1.17 / M = 5.917
Regular Crust / M = 5.83
SD = 1.67 / M = 9.83
SD = 1.17 / M = 7.833
Deep Dish / M = 4.00
SD = 1.17 / M = 6.83
SD = 1.47 / M = 5.417
Column Means / M = 5.833 / M = 6.944 / M = 6.389

And the Summary Table has been filled in:

Source / SS / df` / MS / F / p
Between Conditions / 147.889 / 5 / ------/ ------/ ------
Crust Type / 39.055 / 2 / 19.53 / 11.10 / < .01
Topping / 11.111 / 1 / 11.11 / 6.31 / < .05
Crust × Topping Interaction / 97.723 / 2 / 48.86 / 27.76 / < .01
Within Conditions / 52.667 / 30 / 1.76 / ------/ ------
Total / 200.56 / 35 / ------/ ------/ ------

2. You should be able to see from your graph that the strong preference for pepperoni versus plain cheese on regular-crust pizza diminishes for deep dish, and actually reversesdirection for thin-crust pizza. There is no possible follow-up for the significant effect of topping, because it has only two levels, so we will proceed with the LSD test to follow up on the significant main effect of crust type.

The effect size associated with the main effect of crust type is:

This is a very large effect. According to the LSD test, regular crust beats both thin crust and deep dish significantly, while the latter two don’t differ from each other significantly. However, this result is misleading; regular crust comes out on top only in the pepperoni condition. We should have suspected the possibility of misleading main effects when we saw that the interaction of the two factors was very large, easily significant, and involved crisscrossing lines on our graph. This pattern suggests that we ignore the follow-up of the crust-type main effect and focus on following up the significant interaction. We can begin that follow-up by testing the simple main effects of crust type separately for both the pepperoni and cheese only conditions.

The Test for a Simple Main Effect of Crust Type for Cheese Only:

MSbet = 40.33/2 = 20.17; MSW = 1.76 (from two-way ANOVA); F = 20.17/1.76 = 11.46, p < .01.

Given the significance of this effect, we can use LSD to test for differences between pairs of pizza crust types within the cheese only category.

Note that the only difference between this LSD calculation and the one for the main effect of crust type is that n is now only 6. This time all three differences exceed LSD, so we can say that for the cheese only condition each crust type differs significantly from each other, with thin crust as the best, regular crust second, and deep dish third.

The Test for a Simple Main Effect of Crust Type for the Pepperoni Category:

MSbet = 96.44/2 = 48.22; MSW = 1.76 (again); F = 48.22/1.76 = 27.40, p < .01.

LSD = 1.56 (same as for cheese only), and again, all three differences exceed LSD.However, for the pepperoni condition, regular crust is the winner (M= 9.83), with deep dish second (M= 6.83), and thin crust bringing up the rear (M= 4.17).

Simple Main Effects of Toppingfor Each Pizza Crust Type:

If we test the simple main effects the other way—comparing cheese only to pepperoni for each crust type—we can still use the same value for LSD, which is 1.56, because we are using the same error term (MSW), the same critical t value, and the same n. We would find that topping makes a significant difference for each crust type, because all three differences exceed LSD.

Finally, the effect size associated with the interaction is: