ARG/PDW: MCEN4027F00 V-27
Analysis of Variance (ANOVA)
Introduction
Once the data for a particular experimental design has been collected, we want to use the sample information to make inferences about the population means associated with the various treatments.
A powerful method for comparing treatment means is known as analysis of variance.
The Logic Underlying ANOVA
Consider an experiment with a single factor at two levels, i.e., two treatments.
Suppose we want to decide whether the two treatment means differ based on the means of two independent random samples, each containing n1 = n2 = 5 measurements, and that the y values appear as in Figure 1 below.
Note that the five circles on the left are plots of the y values for sample 1 and the five solid dots on the right are plots of the y values for sample 2.
Also observe the horizontal lines that pass through the means for the two samples, and .
Do the plots in the figure provide sufficient evidence to indicate a difference between the two sample means?
Now examine the situation for two different sample means in Figure 2 (below) where it appears as though the population means differ.
Now, consider a third case in Figure 3 (below).
For these data it appears that there is little or no difference between the population means.
What elements of the figures did we intuitively use to decide whether the data indicate a difference between the population means?
The answer is that we visually compared the distance (variation) between the sample means to the variation within the y values for each of the two samples.
Since the difference between the sample means in Figure 2 is large relative to the within-sample variation, we inferred that the population means differ.
Conversely, in Figure 3, the variation between the sample means is small relative to the within-sample variation and therefore there is little evidence to infer that the means are significantly different.
The variation within samples is measured by the pooled s2 that we computed for the independent random-samples t test previously,
Within-sample variation, s2:
where yi1 is the ith observation in sample 1 and yi2 is the ith observation in sample 2.
The quantity in the numerator of s2 is often denoted SSE, the sum of the squared errors.
SSE measures unexplained variability, specifically the variability unexplained by the differences between the sample means.
A measure of the between-sample variation is given by the weighted sum of the squares of deviations of the individual sample means about the mean for all 10 observations, , divided by the number of samples minus 1, i.e.,
Between-sample variation:
The quantity in the numerator is often denoted SST, the sum of squares for treatments since it measures the variability explained by the differences between the sample means of the two treatments.
For this experimental design, SSE and SST sum to a known total, namely,
SS(Total) =
Also, the ratio
has an F distribution with n1 = 1 and n2 = (n1 + n2 –2) degrees of freedom and therefore can be used to test the null hypothesis of no difference between the treatment means.
The additivity property of the sums of squares led to a view of this analysis as partitioning of SS(Total) into sources corresponding to the factors included in the experiment and to SSE.
Consequently, this became known as an analysis of variance (ANOVA).
ANOVA can be easily applied to the experimental design schemes that we previously considered including complete factorial, partial factorial and nested.
In addition, the principles of randomization and blocking also lend themselves to an ANOVA treatment.
ANOVA: SINGLE FACTOR RANDOMIZED DESIGN
A completely randomized design to compare p treatment means is one in which the treatments are randomly assigned to the experimental units, or in which independent random samples are drawn from each of the p populations.
An ANOVA provides an easy way to analyze the data from a completely randomized design.
This analysis partitions SS(Total) into two components, SSE and SST.
The quantity SST denotes the sum of squares for treatments and measures the variation explained by the differences between the treatment means.
The sum of squares for error, SSE, is a measure of the unexplained variability, obtained by calculating a pooled measure of the variability within the p samples.
If the treatment means truly differ, then SSE should be substantially smaller than SST.
We compare the two sources of variability forming an F statistic:
where n is the total number of measurements.
The numerator of the F statistic denotes mean square for treatments and is based on (p-1) degrees of freedom – one for each of the p treatments minus 1 for the estimation of the overall mean.
The denominator of the F statistic denotes mean square for error and is based on (n-p) degrees of freedom – one for each on the n measurements minus 1 for each of the p treatment means being estimated.
Thus, F is based on n1 = (p-1) and n2 = (n-p) degrees of freedom.
If the computed value of F exceeds the upper critical value, Fa, we reject Ho and conclude that at least two of the treatment means differ.
The notation used in an analysis of variance of a completely randomized design is given in the following table and the key elements of the corresponding ANOVA F test are then presented.
ANOVA Example 1
An experiment was conducted to compare the wear qualities of three types of paint when subjected to the abrasive action of a slowly rotating cloth-surfaced wheel.
Ten paint specimens were tested for each paint type, and the number of hours until visible abrasion was apparent was recorded for each specimen.
The data with totals are shown below.
Table 1. Wear Data in Hours for Three Paint Types
A / B / C148
76
393
520
236
134
55
166
415
153 / 513
264
433
94
535
327
214
135
280
304 / 335
643
216
536
128
723
258
380
594
465
2,296 3,099 4,278
Is there sufficient evidence to indicate a difference in the mean time until abrasion is visibly evident for the three paint types?
The experiment involves a single factor, paint type, which is at three levels.
We utilize an approach with a completely randomized design with p = 3 treatments.
Let m1, m2, and m3 represent the mean abrasion times for paint types A, B, and C, respectively.
Ho: m1 = m2 = m3
H1: At least two of the three means differ.
The null hypothesis is tested using a = 0.05.
A statistical software package can be used to perform the ANOVA calculations.
Excel Spreadsheet: Single Factor ANOVA
Groups / Count / Sum / Mean / Variance
A / 10 / 2296 / 229 / 25026.0
B / 10 / 3099 / 309 / 21866.7
C / 10 / 4278 / 427 / 38737.2
ANOVA
Source of Variation / SS / df / MS / F / P-value / F crit
Between Groups
(Paint Types) / 198772 / 2 / 99386 / 3.48 / 0.0452 / 3.35
Within Groups
(Error) / 770670 / 27 / 28543Total
/ 969443 / 29
Interpretation of the ANOVA table.
The value of the test statistic is F = 3.48 and the rejection region for the test is F > F0.05.
The value of F0.05 based upon n1 = (p-1) =2 and n2 = (n-p) = 27 degrees of freedom is 3.35.
Since the computed value of the test statistic exceeds the critical value, there is evidence to indicate a difference in the mean time to visible abrasion for the three paint types, i.e., a treatment effect.
We can also arrive at the same decision by observing that the p-value of the test is 0.0452.
Since a = 0.05 exceeds this p-value, we have sufficient evidence to reject Ho.
The p-value is the smallest level of significance at which Ho would be rejected when a specified test procedure is used on a given data set.
Once the p-value has been determined, the conclusion at any particular level a results from comparing the p-value to a.
· p-value £ a Þ reject Ho at level a
· p-value > a Þ do not reject Ho at level a
ANOVA – SINGLE-FACTOR RANDOMIZED BLOCK DESIGN
Noise reduction in an experimental design, i.e., the removal of extraneous experimental variation, can be accomplished by an appropriate assignment of treatments to the experimental units.
The most common design of this type, termed a randomized block design, often provides more information per observation than the amount contained in a completely randomized design.
In general, a randomized block design to compare p treatments will contain b blocks, with each block containing p experimental units.
Each treatment appears once in every block with the p treatments randomly assigned to the experimental units within each block.
Suppose we want to compare the abilities of four construction cost engineers, A, B, C, and D.
One way to make the comparison would be to select 40 different construction jobs and randomly allocate ten jobs to each of the four engineers.
Each engineer would estimate the cost and we would record y, the difference between the estimated and actual costs expressed as a percentage of the actual cost.
The problem with using a completely randomized design for the cost engineering experiment is that comparison of mean percentage errors will be influenced by the nature of the jobs.
Some jobs will be easier to estimate than others, and the variation in percentage errors that can be attributed to this fact will make it more difficult to compare the treatment means.
To eliminate the effect of job-to-job variability in comparing mean engineering estimates, we could select only ten jobs and require each engineer to estimate the cost of each of the ten jobs.
Although in this case there is probably no need for randomization, it might be desirable to randomly assign the order (in time) of the estimates.
Thus, this randomized block design would consist of p = 4 treatments and b = 10 blocks.
The SS(Total) for the randomized block design is now partitioned into three parts:
SS (Total) = SSB + SST + SSE
The notation that we will employ in the analysis of variance formulas is summarized in the following table.
The formulas for calculating SST and SSB take the same pattern as the formula for calculating SST for the completely randomized design.
Once these quantities have been calculated, we find SSE by subtraction:
SSE = SS (Total) - SSB - SST
We are interested in using the randomized block design to test the same null and alternative hypotheses that we tested using the completely randomized design:
Ho: m1 = m2 = ... = mp
H1: At least two treatment means differ
The test statistic, also identical to that used for the completely randomized design, is:
Since the sum of squares for block, SSB, measures the variation explained by the differences among the block means, we can also test
Ho: The b block means are equal
H1: At least two block means differ
using the test statistic
Although a test of a hypothesis concerning a difference among treatment means is our primary objective, the former test enables us to determine whether there is evidence of a difference among block means – that is, whether blocking is really effective.
If there are no differences among block means, then blocking will not reduce variability in the experiment, and consequently, will be ineffective.
Indeed, if there are no differences among block means, you will lose information by blocking because blocking reduces the number of degrees of freedom associated with s2.
The F tests for the randomized block design and the appropriate formulation of the ANOVA table are presented below:
ANOVA Example 2
Prior to submitting a bid for a construction job, cost engineers prepare a detailed analysis of the estimated labor and materials costs required to complete the job.
This estimate will depend on the engineer who performs the analysis.
An estimate, if too high, will reduce the chance of acceptance of a company’s bid price and, if too low, will reduce the profit or even cause the company to lose money on the job.
A company that employs three job cost engineers wanted to compare the mean level of the engineer’s estimates.
This was done by having each engineer estimate the cost of the same four jobs.
For the resulting data in the table below, perform an analysis of variance to determine whether there is sufficient evidence to indicate differences among treatment and block means.
The significance level should be a = 0.05.
Table 1. Data* for Randomized Block Design
1 / 2 / 3 / 4
Engineer / A / 4.6 / 6.2 / 5.0 / 6.6
B / 4.9 / 6.3 / 5.4 / 6.8
C / 4.4 / 5.9 / 5.4 / 6.3
*Data in hundreds of thousands of dollars.
The data for this experiment were collected according to a randomized block design because we would expect estimates of the same job to be more nearly alike than estimates between jobs.
Thus, the experiment involves p = 3 treatments (engineers) and b = 4 blocks (jobs).
The results of ANOVA calculations are shown in the following Excel spreadsheet.
ARG/PDW: MCEN4027F00 V-27
ANOVA / Single-Factor Randomized Block DesignJob
1 / 2 / 3 / 4
A / 4.6 / 6.2 / 5 / 6.6
Engineer / B / 4.9 / 6.3 / 5.4 / 6.8
C / 4.4 / 5.9 / 5.4 / 6.3
ANOVA
SUMMARY / Count / Sum / Average / Variance
Engineer A / 4 / 22.4 / 5.600 / 0.907
Engineer B / 4 / 23.4 / 5.850 / 0.737
Engineer C / 4 / 22.0 / 5.500 / 0.673
Job 1 / 3 / 13.9 / 4.633 / 0.063
Job 2 / 3 / 18.4 / 6.133 / 0.043
Job 3 / 3 / 15.8 / 5.267 / 0.053
Job 4 / 3 / 19.7 / 6.567 / 0.063
ANOVA
Source of Variation / SS / df / MS / F / P-value / F crit
Engineer / 0.260 / 2 / 0.130 / 4.18 / 0.073 / 5.14
Job / 6.763 / 3 / 2.254 / 72.46 / 0.000042 / 4.76
Error / 0.187 / 6 / 0.031
Total / 7.210 / 11
ARG/PDW: MCEN4027F00 V-29