STAT 211

Handout 10 (Chapter 10):The Analysis of Variance

When two or more populations or treatments are being compared, the characteristic that distinguishes the populations or treatments from one another is called the factor under investigation.

Single Factor ANOVA :

Model: where i=1,...,I (number of treatments), j=1,...,J (number of observations in each treatment).

observations

: ith treatment mean.

: errors which are normally distributed with mean, 0 and the constant variance, .

Alternative model:

where i=1,...,I (number of treatments), j=1,...,J (number of observations in each treatment).

observations

:overall mean

: ith treatment effect.

: errors which are normally distributed with mean, 0 and the constant variance, .

Assumptions:

(ii) 's are independent ('s are independent)

(ii) 's are normally distributed with mean, 0 and the constant variance, .

(iii) 's are normally distributed with mean, and the constant variance, .

Hypothesis: versus at least one for where .and 's are treatment means. Or for all i versus for at least one i where is the ith treatment effect.

Analysis of Variance Table:

Source / df / SS / MS / F / Prob > F
Treatments / I-1 / SSTr / MSTr = SStr / (I-1) / MSTr / MSE / P-value
Error / I(J-1) / SSE / MSE = SSE / [I(J-1)]
Total / IJ-1 / SSTotal

where df is the degrees of freedom, SS is the sum of squares, MS is the mean square.

Reject H0 if the P-value£a or if the test statistics F > Fa;I-1,I(J-1).

If you reject the null hypothesis, you need to use multiple comparison test such as Tukey-Kramer, page 414 to see which means are different.

100(1-a)% simultaneous confidence interval for mi-mj : .

or write the sample means in increasing order and look at their pairwise differences. Reject if in the multiple comparison test.

is the critical value in studentized range distribution (Table A10).

Example 1: The data on Calcium content of wheat is observed. Four different storage times are considered.

Storage Period / Observations
0 months / 58.75 57.94 58.91 56.85 55.21 57.30
1 month / 58.87 56.43 56.51 57.67 59.75 58.48
2 months / 59.13 60.38 58.01 59.95 59.51 60.34
4 months / 62.32 58.76 60.03 59.36 59.61 61.95

Is there sufficient evidence to conclude that the mean calcium content is not the same for the four different storage times? Use a=0.05.

You are testing versus at least one for , i,j=1,2,3,4.

Assumptions:

(i) Each month's distribution is normal.

(ii) Each month's distribution has identical standard deviations.

(iii) The observations selected for each month are independent from one another.

(iv) The samples selected for each month are independent from one another.

Analysis of Variance

Source DF SS MS F P

Factor 3 32.1381669 10.7127223 6.51 0.0030

Error 20 32.9010529 1.64505264

Total 23 65.0392198

Level N Mean StDev

Month0 6 57.493333 1.3748118

Month1 6 57.951666 1.3288857

Month2 6 59.553333 0.89694374

Month4 6 60.338333 1.4559044

Total 24 58.834166 1.681604

Bartlett's test for equal variances(normal distribution)

Test Statistic: 1.1633

P-Value : 0.762

Tukey-Kramer multiple comparison test gives 95% simultaneous confidence intervals for

: (-2.5319, 1.6152)

: (-4.1335, 0.0135)

: (-4.9185, -0.7715)

: (-3.6752, 0.4719)

: (-4.4602, -0.3131)

: (-2.8585, 1.2885)

Example 2: An engineer conducted a study of the factors influencing the lengths of steel bars. The lengths of twelve bars were taken from a screw machine, 4 being subjected to W heat treatment, 4 to L heat treatment, and 4 to D heat treatment. The lengths (less 438) were as follows:

Heat Treatment

W / L / D
6 / 4 / 7
7 / 6 / 9
1 / -1 / 10
6 / 4 / 6


Analysis of Variance

Source DF SS MS F P

Factor 2 46.17 23.08 3.54 0.074

Error 9 58.75 6.53

Total 11 104.92

Level N Mean StDev

W 4 5.000 2.708

L 4 3.250 2.986

D 4 8.000 1.826

Bartlett's Test (normal distribution)

Test Statistic: 0.637

P-Value : 0.727

Levene's Test (any continuous distribution)

Test Statistic: 0.022

P-Value : 0.979

We will answer the question using the output above.

Example 3: A tire manufacturer wants to test whether the mean diameters of tires produced at its three plants (New York, Illinois, and California) are equal. Last month, he took a random sample of tires at each plant, and their diameters (in inches) were as follows:

New York / Illinois / California
24.2 24.2 24.1
24.1 24.2 24.4
24.3 24.3 24.4 / 24.4 24.2 24.3
24.3 24.1 24.4
24.3 24.4 24.4 / 24.4 24.3 24.2
24.5 24.4 24.2
24.4 24.5 24.4


Analysis of Variance

Source DF SS MS F P

Factor 2 0.0674 0.0337 2.78 0.082

Error 24 0.2911 0.0121

Total 26 0.3585

Level N Mean StDev

NY 9 24.244 0.113

IL 9 24.311 0.105

CA 9 24.367 0.112

Bartlett's Test (normal distribution)

Test Statistic: 0.042

P-Value : 0.979

Levene's Test (any continuous distribution)

Test Statistic: 0.063

P-Value : 0.939

Confidence Interval for : with equal sample sizes will be discussed in class.

If we go back to example 2, I do not approve L treatment. I really like to test see the differences between the average of L and the combined average of D and W.

then

=(-6.79 , 0.29) is the 95% C.I. for where =2.262.

For the case of unequal sample sizes, let and j=1,…,Ji . Then the difference in the analysis of variance table and multiple comparison test is as follows.

Source / df / SS / MS / F / Prob > F
Treatments / I-1 / SSTr / MSTr = SStr / (I-1) / MSTr / MSE / P-value
Error / n-I / SSE / MSE = SSE / (n-I)
Total / n-1 / SSTotal

where df is the degrees of freedom, SS is the sum of squares, MS is the mean square.

Reject H0 if the P-value£a or if the test statistics F > Fa;I-1,n-I.

If you reject the null hypothesis, you need to use multiple comparison test such as Tukey-Kramer to see which means are different.

100(1-a)% simultaneous confidence interval for mi-mj : .

or write the sample means in increasing order and look at their pairwise differences.

Reject if for the Tukey-Kramer.

is the critical value in studentized range distribution (Table A10).

Example 4 (Exercise 10.22): The data is about the yield of tomatoes for four different levels of salinity. Using a=0.05, test for any differences in true average yield due to the different salinity levels.

Analysis of Variance

Source DF SS MS F P

Factor 3 456.50 152.17 17.11 0.000

Error 14 124.50 8.89

Total 17 581.00

Individual 95% CIs For Mean

Based on Pooled StDev

Level N Mean StDev ------+------+------+------

level 1.6 5 58.280 3.602 (----*----)

level 3.8 4 55.400 2.665 (----*-----)

level 6.0 4 50.850 2.426 (-----*----)

level 10.2 5 45.500 2.901 (----*----)

------+------+------+------

Pooled StDev = 2.982 48.0 54.0 60.0

Bartlett's Test (normal distribution)

Test Statistic: 0.558

P-Value : 0.906

Levene's Test (any continuous distribution)

Test Statistic: 0.130

P-Value : 0.940

Tukey's pairwise comparisons

Family error rate = 0.0500

Individual error rate = 0.0115

Critical value = 4.11

Intervals for (column level mean) - (row level mean)

1.6 3.8 6.0

3.8 -2.934

8.694

6.0 1.616 -1.578

13.244 10.678

10.2 7.299 4.086 -0.464

18.261 15.714 11.164

Differences between fixed effect and random effect models will be discussed in class.