Notes on Using Sas for Anova

NOTES ON USING SAS FOR ANOVA

(see previous notes on using SAS to run it etc.)

Imagine we wish to know whether drinking coke, pepsi, or orange juice increase the weight of rats. So, you randomly assign 10 rats to each of three treatment groups and then feed them coke, pepsi or orange juice for some period of weeks. You then weigh them at the end of the experiment. So, you want to compare mean weight based upon the treatment they received and to know which means (if any) differ. The factor (drink type) is a fixed effect since I only want to know the effects of just these three kinds of beverages. Naturally, I made these data up.

So here's the SAS program code to carry out this single factor (or 1-way ANOVA).

As before I've put SAS keywords in bold.

The first line you simply assign a name to the data set.

Don't use spaces or keywords in the name and keep it short!

The second line (the INPUT) statement assigns names to the variables and tells SAS their order and the kind of variables they are.

So in this case, I decided to use DRINK as a nominal variable by putting a $ after it.

I could have just numbered the treatments, but you don't have to. You can use meaningful names if you like.

Then the WEIGHT of each rat is a numeric variable.

Then you have to just put in the word CARDS.

Note that a semi-colon must follow each of these statements or SAS will become unhappy (ie you'll likely get some kind of error message).

Then what follows is a separate line of data for each rat.
First it specifies what the rat was drinking and then the rats weight .

After the data set I normally put a line in with just a semicolon.

DATA RATS;

INPUT DRINK $ WEIGHT;

CARDS;

PEPSI 24

PEPSI 34

PEPSI 33

PEPSI 32

PEPSI 31

PEPSI 35

PEPSI 36

PEPSI 37

PEPSI 33

PEPSI 32

COKE 43

COKE 39

COKE 41

COKE 40

COKE 42

COKE 43

COKE 40

COKE 39

COKE 38

ORANGE 22

ORANGE 23

ORANGE 21

ORANGE 23

ORANGE 24

ORANGE 21

ORANGE 20

ORANGE 25

ORANGE 21

;

PROC SORT;

BY DRINK;

/* THIS IS A COMMENT I STUCK IN HERE AND TO INDICATE YOU MUST

ALWAYS SORT THE DATA USING THE STATEMENT ABOVE PRIOR TO

DOING ANY ANALYSES */

PROC ANOVA;

CLASS DRINK;

MODEL WEIGHT = DRINK;

/* SO IN THE ABOVE I'VE INVOKE THE ANOVA PROCEDURE AND TOLD SAS

THAT THE EXPLANATORY OR TREATMENT VARIABLE IS DRINK AND

THAT THE RESPONSE VARIABLE IS WEIGHT, ESSENTIALLY SPECIFYING

THE LINEAR MODEL FOR A SINGLE FACTOR ANOVA */

MEANS DRINK / HOVTEST TUKEY;

/* THE ABOVE STATEMENT TELLS SAS TO COMPUTE THE MEAN OF EACH TREATMENT AND TO USE AN APPOSTERIO TEST CALLED "TUKEY'S" TEST TO COMPARE EACH MEAN */

RUN;

THE OUTPUT IS ON THE FOLLOWING PAGE

OUTPUT FROM THE ANOVA

The ANOVA Procedure

Class Level Information

Class Levels Values

DRINK 3 COKE ORANGE PEPSI

Number of Observations Read 30

Number of Observations Used 30

The ANOVA Procedure

Dependent Variable: WEIGHT

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 2 1612.466667 806.233333 125.83 <.0001

Error 27 173.000000 6.407407

Corrected Total 29 1785.466667

R-Square Coeff Var Root MSE WEIGHT Mean

0.903107 7.943365 2.531286 31.86667

Source DF Anova SS Mean Square F Value Pr > F

DRINK 2 1612.466667 806.233333 125.83 <.0001

NOW FROM THE APOSTERIORI TEST

The ANOVA Procedure

Tukey's Studentized Range (HSD) Test for WEIGHT

NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II

error rate than REGWQ.

Alpha 0.05

Error Degrees of Freedom 27

Error Mean Square 6.407407

Critical Value of Studentized Range 3.50643

Minimum Significant Difference 2.8068

Means with the same letter are not significantly different.

Tukey Grouping Mean N DRINK

A 40.400 10 COKE

B 32.700 10 PEPSI

C 22.500 10 ORANGE

So the conclusion from the ANOVA are that 1 or more means are different.

Tukey's test indicates that all means are different from each other.

Now let's do a 2-way ANOVA or a 2 factor ANOVA (fully crossed design - ie all combinations of treatments are included. BOTH FACTORS FIXED. NOTE THAT IF THE DESIGN IS BALANCED (EQUAL n IN EACH CELL YOU USE PROC ANOVA. FOR UNBALANCED DESIGNS USE PROC GLM AND THE APPROPRIATE TYPES OF SUMS OF SQUARES.

So let's expand the rat drinking experiment.

This time the rats are give two treatments.

They are given coke, pepsi or orange juice and

the second factor is they are given daily exercise or no exercise.

In this experiment, 4 different rats are randomly assigned to one the 6 treatment combinations and you weight them at the end of the experiment.

so the data and experiment are shown in the table below:

no exercise / exercise
coke / 56
55
58
54 / 50
53
54
52
pepsi / 56
55
58
59 / 55
54
51
53
orange / 44
43
45
43 / 10
11
13
14

So here is the SAS program to run this:

DATA RATS2;

INPUT DRINK $ EXER $ WEIGHT;

CARDS;

COKE NO 56

COKE NO 55

COKE NO 58

COKE NO 54

COKE YES 50

COKE YES 53

COKE YES 54

COKE YES 52

PEPSI NO 56

PEPSI NO 55

PEPSI NO 58

PEPSI NO 59

PEPSI YES 55

PEPSI YES 54

PEPSI YES 51

PEPSI YES 53

ORANGE NO 44

ORANGE NO 43

ORANGE NO 45

ORANGE NO 43

ORANGE YES 10

ORANGE YES 11

ORANGE YES 13

ORANGE YES 14

;

PROC SORT;

BY DRINK EXER;

/* SO HERE MUST SORT BY BOTH VARIABLES */

PROC ANOVA;

CLASS DRINK EXER;

MODEL WEIGHT = DRINK EXER DRINK*EXER;

/* SPECIFIES A 2 WAY LINEAR MODEL WITH BOTH MAIN EFFECTS AND

INTERACTION TERM. THIS COULD BE WRITTEN AS DRINK | EXER */

MEANS DRINK / TUKEY;

RUN;

OUT PUT IS ON FOLLOWING PAGE

The ANOVA Procedure

Class Level Information

Class Levels Values

DRINK 3 COKE ORANGE PEPSI

EXER 2 NO YES

Number of Observations Read 24

Number of Observations Used 24

The ANOVA Procedure

Dependent Variable: WEIGHT

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 5 5872.333333 1174.466667 431.44 <.0001

Error 18 49.000000 2.722222

Corrected Total 23 5921.333333

R-Square Coeff Var Root MSE WEIGHT Mean

0.991725 3.612954 1.649916 45.66667

Source DF Anova SS Mean Square F Value Pr > F

DRINK 2 3803.583333 1901.791667 698.62 <.0001

EXER 1 1014.000000 1014.000000 372.49 <.0001

DRINK*EXER 2 1054.750000 527.375000 193.73 <.0001

The ANOVA Procedure

Tukey's Studentized Range (HSD) Test for WEIGHT

NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II

error rate than REGWQ.

Alpha 0.05

Error Degrees of Freedom 18

Error Mean Square 2.722222

Critical Value of Studentized Range 3.60930

Minimum Significant Difference 2.1054

Means with the same letter are not significantly different.

Tukey Grouping Mean N DRINK

A 55.1250 8 PEPSI

A 54.0000 8 COKE

B 27.8750 8 ORANGE

so two way anova indicates both main effects and interaction are statistically

significance. Because of this, it doesn't make sense to carry out the aposteriori test I've computed immediately above here, since it is more relevant to compare particular combinations of treatmentss to one another.