Box-Cox Procedure for Single-Factor ANOVA

When both normality and homoscedasticity assumptions are violated, the Box-Cox procedure provides a general method for finding a suitable transformation of the response variable. Below, the Box-Cox procedure is applied to the Servo-Data example. The following SAS program would need to be run for various values of  within a range from -1 to 1. The best transformation would be the one yielding the smallest value for SSE.

data one;

input trt y;

label trt = "Location"

y = "Time Between Failures";

ypr = y;

obs = _n_;

dum = 1;

;

cards;

1 4.41

1 100.65

1 14.45

1 47.13

1 85.21

2 8.24

2 81.16

2 7.35

2 12.29

2 1.61

3 106.19

3 33.83

3 78.88

3 342.81

3 44.33

;

procmeans;

var y;

outputout=ydesc min=miny n=size;

;

data two;set ydesc;

dum = 1;

;

data three;merge one two;by dum;

if miny <= 0then y = y - miny + 1;

keep y trt size dum obs ypr;

;

data four;set three;

if obs = 1thendo;

u = ypr**(1/size);

v = u;

end;

if obs > 1thendo;

v = u*(ypr**(1/size));

u = v;

end;

retain u;

;

procprint;

;

data five;set four;if obs = size;

keep u size dum obs;

dum = 1;

;

procprint;

;

data six;merge three five;by dum;

/* The following statement needs to be changed for different values of lambda. */

lambda = 0.1;

if lambda = 0thendo;

k = u;

w = k*log(y);

end;

if lambda ne 0thendo;

k = lambda*(u**(lambda-1));

w = ((y**lambda) - 1)/k;

end;

;

procprint;

;

procplot;

plot w*trt;

;

procglm;

class trt;

model w = trt;

outputout=resi r=resids;

means trt / lsdduncantukeyscheffe;

title"Analysis of Variance for Ch.18 p. 792";

title2"With Follow-Up Tests";

;

data two;set resi;

/* The following statement creates a dummy variable with value 1 for every */

/* observation in the data set. This variable will be used to merge the sample */

/* statistics with every observation in the original data set. */

dum = 1;

;

procsort;by resids;

;

procmeansnoprint;

var resids;

outputout=meanr mean=mu std=s n=size;

title;

title2;

;

data three;set meanr;

dum = 1;

;

data three;merge two three;by dum;

p = (_n_ - 0.5)/size;

/* The following equation would need to be changed for q-q plots for other */

/* probability distributions. For example, for an exponential(mu) distribution, */

/* the statement would be Q = -mu*log(1-p). */

Q = probit(p);

;

procplot;

plot resids*trt;

title'Plot of Residuals vs. Factor Levels';

;

procregnoprint;

model resids = Q;

plot (resids predicted.)*Q / overlay;

title'Normal Probability Plot';

title2'For Residuals from ANOVA';

;

run;