Box-Cox Procedure for Single-Factor ANOVA
When both normality and homoscedasticity assumptions are violated, the Box-Cox procedure provides a general method for finding a suitable transformation of the response variable. Below, the Box-Cox procedure is applied to the Servo-Data example. The following SAS program would need to be run for various values of within a range from -1 to 1. The best transformation would be the one yielding the smallest value for SSE.
data one;
input trt y;
label trt = "Location"
y = "Time Between Failures";
ypr = y;
obs = _n_;
dum = 1;
;
cards;
1 4.41
1 100.65
1 14.45
1 47.13
1 85.21
2 8.24
2 81.16
2 7.35
2 12.29
2 1.61
3 106.19
3 33.83
3 78.88
3 342.81
3 44.33
;
procmeans;
var y;
outputout=ydesc min=miny n=size;
;
data two;set ydesc;
dum = 1;
;
data three;merge one two;by dum;
if miny <= 0then y = y - miny + 1;
keep y trt size dum obs ypr;
;
data four;set three;
if obs = 1thendo;
u = ypr**(1/size);
v = u;
end;
if obs > 1thendo;
v = u*(ypr**(1/size));
u = v;
end;
retain u;
;
procprint;
;
data five;set four;if obs = size;
keep u size dum obs;
dum = 1;
;
procprint;
;
data six;merge three five;by dum;
/* The following statement needs to be changed for different values of lambda. */
lambda = 0.1;
if lambda = 0thendo;
k = u;
w = k*log(y);
end;
if lambda ne 0thendo;
k = lambda*(u**(lambda-1));
w = ((y**lambda) - 1)/k;
end;
;
procprint;
;
procplot;
plot w*trt;
;
procglm;
class trt;
model w = trt;
outputout=resi r=resids;
means trt / lsdduncantukeyscheffe;
title"Analysis of Variance for Ch.18 p. 792";
title2"With Follow-Up Tests";
;
data two;set resi;
/* The following statement creates a dummy variable with value 1 for every */
/* observation in the data set. This variable will be used to merge the sample */
/* statistics with every observation in the original data set. */
dum = 1;
;
procsort;by resids;
;
procmeansnoprint;
var resids;
outputout=meanr mean=mu std=s n=size;
title;
title2;
;
data three;set meanr;
dum = 1;
;
data three;merge two three;by dum;
p = (_n_ - 0.5)/size;
/* The following equation would need to be changed for q-q plots for other */
/* probability distributions. For example, for an exponential(mu) distribution, */
/* the statement would be Q = -mu*log(1-p). */
Q = probit(p);
;
procplot;
plot resids*trt;
title'Plot of Residuals vs. Factor Levels';
;
procregnoprint;
model resids = Q;
plot (resids predicted.)*Q / overlay;
title'Normal Probability Plot';
title2'For Residuals from ANOVA';
;
run;