Script
ANOVA – Single Factor
Slide 1
· Welcome back. In the previous modules we discussed how to compare two population means. In this module we begin our extension of this concept to the comparison of three or more population means.
Slide 2
· This analysis of the comparison of three or more population means is done by an approach known as ANOVA.
· ANOVA stands for Analysis of Variance.
· The basic idea behind ANOVA goes like this: When we take samples from say 4 populations, even if their true means are equal, by pure randomness, we cannot expect the sample means to be equal. So when we do get 4 different sample means, the question is:
o How much of the variability in these means is simply due to randomness or unexplained factors, and
o How much is due to the fact that we did sample from different populations with potentially 4 different true means? -- We analyze this variability in the sample means – hence the term Analysis of Variance.
Slide 3
· ANOVA was actually developed in the 1930’s by statistical researchers who were investigating factors affecting crop growth during the Dust Bowl of the Great Depression in the United States. Thus much of the terminology used in ANOVA has an agricultural flavor.
· A response variable, y,
o is the thing we are trying to measure – let’s say that it is bushels per acre of corn
· An experimental unit
o Is a unit on which we will measure the response variable – like one particular acre of corn
· The experiment we will be looking at has one or more factors.
o Factors are the independent variables we think may be affecting the response variable y – for instance fertilizer type and amount of water given to an acre of land.
· The levels of the factor
o are the discrete values of the factors – We may be testing fertilizers A, B, and C – these would be the levels of the factor fertilizer; and we may only be testing using 1 acre-foot, 3 acre-feet, or 5 acre-feet of water – thus 1, 3, and 5 would be the levels of the factor, water.
· A treatment
o Is the combination of the levels of the factors that an experimental unit receives
Slide 4
· Thus for our example
· Where we wished to know how various combinations of water and fertilizer combine to affect crop growth
· Again the response variable is
o The crop yield in bushels per acre
· The experimental unit is
o Each acre in our experiment that receives a treatment
· The two factors are
o Water and fertilizer
· The levels for water
o are 1, 3, and 5, and the levels for fertilizer are A, B, and C
· And the 9 possible treatments are
o 1 acre-foot of water and fertilizer A, 3 acre-feet of water and fertilizer A, and so forth down to 5 acre-feet of water and fertilizer C
Slide 5
· We begin our discussion of ANOVA by concentrating on only one factor
· This is called single factor ANOVA
o Now since there is only one factor, the treatments are the various levels of this one factor – that is, for single factor ANOVA, when there is only one factor, Levels and Treatments are the same thing
· The analyses in single factor ANOVA are predicated on three assumptions:
o That the distribution of outcomes for each treatment has a normal distribution
o That, although the standard deviations of each of the treatments is unknown, we assume that they are equal
o And finally that our samples will be
§ Random and
§ Independent
Slide 6
· Let’s look at a specific example.
· Suppose a university offers the introductory statistics course in four formats and it wishes to know if student performance on the final exam is affected by the delivery mode.
· The four different levels or treatments of the single factor – delivery mode – are:
o The traditional lecture format
o A text reading format, where the professor does not lecture – students just read the text and the professor responds to questions in class
o A videotape, such as Annenberg’s “Against All Odds” that is broadcast periodically on PBS stations and can be purchased by universities
o An on-line internet format
· Random samples of students were selected from students taking the course in each of the four formats and their final exam scores were recorded.
Slide 7
· Suppose there were 26 students selected among the various formats and here were the scores they received on the final exam.
Slide 8
· Let’s summarize the results using the terminology and notation that we will use throughout our discussion of ANOVA.
· There is a single factor – teaching format or delivery mode
· There are k equal 4 possible levels or treatments – lecture, text, videotape, and internet
· In this experiment n equal to 26 total students were surveyed. Of these n1 or 7 students took the course in the lecture format, n2 or 8 took it by text reading, n3 or 6 took it using the videotape format, and n4 or 5 took the course over the internet.
· The average of the final exam scores from the 7 students who took the course in the lecture format is x1-bar or 76, of the 8 students who took it in the text reading format, their average was x2-bar or 65, the average of the 6 who took the course by video tape is x3-bar or 75, and the average of the 5 students who took the course over the internet is x4-bar or 74. And the overall average of all 26 students, regardless of which format of their course is called the grand mean and is designated by x-bar without a subscript and was 72.
Slide 9
· As we said before, we did not expect all four x-bars or sample means to be the same. But why are they different?
· Well, there is some variability because they took the course in different formats – this is the Between Treatment Variability which we simply call TREATMENT variability
· And there is variability within each treatment (all the lecture student scores are not the same, all the text scores are not the same and so forth). This variability due to randomness is called ERROR variability.
· Now here comes the basic concept behind ANOVA.
· If the average variability that we measure that is due to TREATMENT is “a lot greater” than just what we call “noise” which is the average variability due to ERROR, we can reasonably conclude that there are differences in the overall average scores (the mu’s) of the different population or teaching formats.
Slide 10
· That concept is actually straightforward
· But the basic questions are
· What do we mean by variability due to treatment and due to error in general?
· And average variability due to treatment and due to error in particular?
· And what do we mean by “a lot greater”
o That is, how much larger does the average treatment variability have to be before we are convinced that there are differences in the performance of those who take the course in different formats?
Slide 11
· We’ll first look at total variability – why does one observation differ from another? – and we say this total variability is made up of two parts – variability due to the fact that there are different treatments (or Treatment Variability) and Variability due to randomness (or Error Variability).
· We measure variability as the total squared deviations from a mean value. So if we look at all 26 observations, the total variability
· Which we call SST (Sum of Squares Total) is
· Found by taking each observation subtracting off the grand mean (72 in this case), squaring the differences, and summing them up.
· SSTr is called the Sum of Squares due to Treatment or the Between Treatment Sum of Squares and
o Is the sum of the squared deviations due to different treatments
· And SSE, called the Sum of Squares due to Error or the Within Treatment Sum of Squares
o Is the sum of the squared deviations that are not explained by the treatment, and thus we say they occur due to ERROR.
· The key point is that the total sum of squares equals the sum of squares due to treatment plus the sum of squares due to error. We’ll show how to calculate each of these momentarily.
Slide 12
· We’ve talked briefly about Total Variability, but what is “Average Variability”?
· Average variability due to treatment and error are expressed in terms of average or mean square values – they are called MSTr – mean square due to treatment and MSE – mean square due to error respectively.
· Recall that the general form of a sample variance, s-squared, is the sum of the squared deviations from the mean divided by n-1 where n-1 was the number of degrees of freedom. The same approach holds here. MSTr is found by dividing Sum of Squares due to Treatment, SSTr, by the degrees of freedom that are associated with Treatment term and MSE is found by dividing sum of squares due to error, SSE, by the degrees of freedom associated with the error term.
· Here is what is called a standard ANOVA table that gives the breakdown of the sums of squares, the degrees of freedom and hence the mean square values due to Treatment and due to Error. We’ll give formulas for SST, SSTr, and SSE beginning on the next slide, but given that these values have been calculated, let us analyze the rest of the ANOVA table.
o The total degrees of freedom is always the total sample size n minus 1 (26 minus 1 or 25 for our example)
o The treatment degrees of freedom is always the number of treatments minus 1 (4 minus 1 or 3 for our example.
o And since the total degrees of freedom must equal the degrees of freedom due to Treatment plus the Degrees of freedom due to error, we simply subtract the degrees of freedom due to treatment from the total degrees of freedom to get the degrees of freedom due to error. For our example this is 25 minus 3 or 22. Once these have been determined, the mean square treatment is found by dividing the sum of squares due to treatment by the degrees of freedom due to treatment and the mean square due to error is found by dividing the sum of squares due to error by the degrees of freedom due to error.
Slide 13
· We’ll begin our calculations by
· Calculating the total sum of squares or SST
· Once again, here are the observations
· The total sum of squares says “forget which treatment each observation comes from” calculate the numerator of the formula for s-squared in which we take each observation, subtract off the mean (the grand mean in this case) square these differences and add them all up
· For our example this is 82 minus 72 squared, plus 64 minus 72 squared, plus 95 minus 72 squared, and so forth down to the 26-th entry, 81 minus 72 squared. That gives a value for SST of 4394.
Slide 14
· Now we’ll calculate SSTr
· Or the Within Treatment Variability
· Again here is our data
· If we want to isolate the variability due to treatment this means that there would be NO within treatment variability, that is all the numbers for each treatment would be the same if there was no randomness – we take for this “SAME” number the sample mean for each treatment. That is, we consider replacing all entries in lecture by the mean of lecture or 76, all the entries in text by the mean of text, 65, all the entries in videotape by the mean of videotape, 75, and all the entries in internet by the mean of internet, 74. Then we calculate the sum of squares from the grand mean again.
· This would be 76 minus 72 squared, plus 76 minus 72 squared, plus 76 minus 72 squared and so forth or 76 minus 72 squared 7 times plus 65 minus 72 squared 8 times, plus 75 minus 72 squared 6 times plus 74 minus 72 squared 5 times. That is SSTr is found by subtracting the grand mean from each of the sample means and squaring the result. Then each of these four squared values is multiplied by the number of observations for each sample and adding. Doing this gives 578.
Slide 15
· Now that we have SST and SSTr
· SSE, the within treatment variability is found
· Simply by subtracting the Treatment sum of squares from the Total sum of squares
· That is SSE equals SST minus SSTr which in this case is 4394 minus 578 or 3816.
Slide 16
· So now let’s return to the reason why are doing all this. We want to know can we conclude if there is a difference in the average performance on a final exam based on different teaching modes.
· Remember we said we would conclude that there are differences if the average variability due to treatment was “a lot greater” than the average variability due to error, that is, if the ratio of MSTr to MSE is “large”.
· The ratio of these two measures of variability assuming data comes from normal distributions for each treatment is an F-statistic; that is the value of the F-statistic is MSTr over MSE.
o An F-distribution has two degrees of freedom associated with it – numerator degrees of freedom and denominator degrees of freedom – and since Treatment is in the numerator and Error is in the denominator of the F-statistic we will use Treatment degrees of freedom for the numerator
o and error degrees of freedom for the denominator.
· Our test will be that if this ratio of mean squares due to treatment divided by the mean squares due to error, the F-statistic, is large compare to some critical F-value that we will look up in an F-distribution table, we will conclude that there are differences resulting from the different teaching mode, that is, at least one of the mu’s differs from the others.
Slide 17
· So let’s formalize this statistically. Can we conclude there are differences in the population means, that is the mu’s, of the different teaching formats?
· “Can you conclude” means we are doing a hypothesis test. So we first set up our hypotheses. We assume that there are no differences unless we get strong evidence to the contrary – that is, H0 is all the mu’s are equal, and HA is that at least one of the mu’s differs from the others.