NOTES ON USING SAS FOR ANOVA
(see previous notes on using SAS to run it etc.)
Imagine we wish to know whether drinking coke, pepsi, or orange juice increase the weight of rats. So, you randomly assign 10 rats to each of three treatment groups and then feed them coke, pepsi or orange juice for some period of weeks. You then weigh them at the end of the experiment. So, you want to compare mean weight based upon the treatment they received and to know which means (if any) differ. The factor (drink type) is a fixed effect since I only want to know the effects of just these three kinds of beverages. Naturally, I made these data up.
So here's the SAS program code to carry out this single factor (or 1-way ANOVA).
As before I've put SAS keywords in bold.
The first line you simply assign a name to the data set.
Don't use spaces or keywords in the name and keep it short!
The second line (the INPUT) statement assigns names to the variables and tells SAS their order and the kind of variables they are.
So in this case, I decided to use DRINK as a nominal variable by putting a $ after it.
I could have just numbered the treatments, but you don't have to. You can use meaningful names if you like.
Then the WEIGHT of each rat is a numeric variable.
Then you have to just put in the word CARDS.
Note that a semi-colon must follow each of these statements or SAS will become unhappy (ie you'll likely get some kind of error message).
Then what follows is a separate line of data for each rat.
First it specifies what the rat was drinking and then the rats weight .
After the data set I normally put a line in with just a semicolon.
DATA RATS;
INPUT DRINK $ WEIGHT;
CARDS;
PEPSI 24
PEPSI 34
PEPSI 33
PEPSI 32
PEPSI 31
PEPSI 35
PEPSI 36
PEPSI 37
PEPSI 33
PEPSI 32
COKE 43
COKE 39
COKE 39
COKE 41
COKE 40
COKE 42
COKE 43
COKE 40
COKE 39
COKE 38
ORANGE 22
ORANGE 23
ORANGE 21
ORANGE 23
ORANGE 24
ORANGE 21
ORANGE 20
ORANGE 25
ORANGE 25
ORANGE 21
;
PROC SORT;
BY DRINK;
/* THIS IS A COMMENT I STUCK IN HERE AND TO INDICATE YOU MUST
ALWAYS SORT THE DATA USING THE STATEMENT ABOVE PRIOR TO
DOING ANY ANALYSES */
PROC ANOVA;
CLASS DRINK;
MODEL WEIGHT = DRINK;
/* SO IN THE ABOVE I'VE INVOKE THE ANOVA PROCEDURE AND TOLD SAS
THAT THE EXPLANATORY OR TREATMENT VARIABLE IS DRINK AND
THAT THE RESPONSE VARIABLE IS WEIGHT, ESSENTIALLY SPECIFYING
THE LINEAR MODEL FOR A SINGLE FACTOR ANOVA */
MEANS DRINK / HOVTEST TUKEY;
/* THE ABOVE STATEMENT TELLS SAS TO COMPUTE THE MEAN OF EACH TREATMENT AND TO USE AN APPOSTERIO TEST CALLED "TUKEY'S" TEST TO COMPARE EACH MEAN */
RUN;
THE OUTPUT IS ON THE FOLLOWING PAGE
OUTPUT FROM THE ANOVA
The ANOVA Procedure
Class Level Information
Class Levels Values
DRINK 3 COKE ORANGE PEPSI
Number of Observations Read 30
Number of Observations Used 30
The ANOVA Procedure
Dependent Variable: WEIGHT
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 1612.466667 806.233333 125.83 <.0001
Error 27 173.000000 6.407407
Corrected Total 29 1785.466667
R-Square Coeff Var Root MSE WEIGHT Mean
0.903107 7.943365 2.531286 31.86667
Source DF Anova SS Mean Square F Value Pr > F
DRINK 2 1612.466667 806.233333 125.83 <.0001
NOW FROM THE APOSTERIORI TEST
The ANOVA Procedure
Tukey's Studentized Range (HSD) Test for WEIGHT
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II
error rate than REGWQ.
Alpha 0.05
Error Degrees of Freedom 27
Error Mean Square 6.407407
Critical Value of Studentized Range 3.50643
Minimum Significant Difference 2.8068
Means with the same letter are not significantly different.
Tukey Grouping Mean N DRINK
A 40.400 10 COKE
B 32.700 10 PEPSI
C 22.500 10 ORANGE
So the conclusion from the ANOVA are that 1 or more means are different.
Tukey's test indicates that all means are different from each other.
Now let's do a 2-way ANOVA or a 2 factor ANOVA (fully crossed design - ie all combinations of treatments are included. BOTH FACTORS FIXED. NOTE THAT IF THE DESIGN IS BALANCED (EQUAL n IN EACH CELL YOU USE PROC ANOVA. FOR UNBALANCED DESIGNS USE PROC GLM AND THE APPROPRIATE TYPES OF SUMS OF SQUARES.
So let's expand the rat drinking experiment.
This time the rats are give two treatments.
They are given coke, pepsi or orange juice and
the second factor is they are given daily exercise or no exercise.
In this experiment, 4 different rats are randomly assigned to one the 6 treatment combinations and you weight them at the end of the experiment.
so the data and experiment are shown in the table below:
no exercise / exercisecoke / 56
55
58
54 / 50
53
54
52
pepsi / 56
55
58
59 / 55
54
51
53
orange / 44
43
45
43 / 10
11
13
14
So here is the SAS program to run this:
DATA RATS2;
INPUT DRINK $ EXER $ WEIGHT;
CARDS;
COKE NO 56
COKE NO 55
COKE NO 58
COKE NO 54
COKE YES 50
COKE YES 53
COKE YES 54
COKE YES 52
PEPSI NO 56
PEPSI NO 55
PEPSI NO 58
PEPSI NO 59
PEPSI YES 55
PEPSI YES 54
PEPSI YES 51
PEPSI YES 53
ORANGE NO 44
ORANGE NO 43
ORANGE NO 45
ORANGE NO 43
ORANGE YES 10
ORANGE YES 11
ORANGE YES 13
ORANGE YES 14
;
PROC SORT;
BY DRINK EXER;
/* SO HERE MUST SORT BY BOTH VARIABLES */
PROC ANOVA;
CLASS DRINK EXER;
MODEL WEIGHT = DRINK EXER DRINK*EXER;
/* SPECIFIES A 2 WAY LINEAR MODEL WITH BOTH MAIN EFFECTS AND
INTERACTION TERM. THIS COULD BE WRITTEN AS DRINK | EXER */
MEANS DRINK / TUKEY;
RUN;
OUT PUT IS ON FOLLOWING PAGE
The ANOVA Procedure
Class Level Information
Class Levels Values
DRINK 3 COKE ORANGE PEPSI
EXER 2 NO YES
Number of Observations Read 24
Number of Observations Used 24
The ANOVA Procedure
Dependent Variable: WEIGHT
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 5 5872.333333 1174.466667 431.44 <.0001
Error 18 49.000000 2.722222
Corrected Total 23 5921.333333
R-Square Coeff Var Root MSE WEIGHT Mean
0.991725 3.612954 1.649916 45.66667
Source DF Anova SS Mean Square F Value Pr > F
DRINK 2 3803.583333 1901.791667 698.62 <.0001
EXER 1 1014.000000 1014.000000 372.49 <.0001
DRINK*EXER 2 1054.750000 527.375000 193.73 <.0001
The ANOVA Procedure
Tukey's Studentized Range (HSD) Test for WEIGHT
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II
error rate than REGWQ.
Alpha 0.05
Error Degrees of Freedom 18
Error Mean Square 2.722222
Critical Value of Studentized Range 3.60930
Minimum Significant Difference 2.1054
Means with the same letter are not significantly different.
Tukey Grouping Mean N DRINK
A 55.1250 8 PEPSI
A
A 54.0000 8 COKE
B 27.8750 8 ORANGE
so two way anova indicates both main effects and interaction are statistically
significance. Because of this, it doesn't make sense to carry out the aposteriori test I've computed immediately above here, since it is more relevant to compare particular combinations of treatmentss to one another.