DOE Tutorial Regression, Analysis of Covariance, and RCB Designs

DOE Tutorial – Regression and Analysis of Covariance

Regression Example: Power and Etch Rate (revisited)

Data File: Power-Etch (Ex 3-1).JMP

We have analyzed there data previously by treating the power setting as a categorical factors with 4 levels using ANOVA. That is analysis is fine, but it does not allow for us to make inferences about what the etch rate might be if we used a power setting of
210 W. Because the levels of the factor are numeric we could treat them as such and use regression analysis to develop a model that relates etch rate to the power setting in watts.

In regression we develop models for:

the mean of a response Y given X

and

the variance of a response Y given X ß sometimes denoted Var(Y|X)

Here X represents a set of potential explanatory/predictor variables or factors that we think might be related to the response Y. Virtually every analysis we will perform in this class is actually a regression analysis. For example putting the one-way ANOVA (single factor) model into this regression context we have,

for

i.e., the variance is constant and does not depend on X.

To relate a response Y to a single numeric predictor X we often times use a simple linear regression model. Simple means a single predictor is used and linear means the model is linear in X.

Note: y-intercept and slope of the line

ß not required but we only focus on constant variance case here

To fit the simple linear regression model for the power/etch rate experiment make sure that power level is interpreted as continuous © , then select Analyze > Fit Y by X placing Etch Rate in the Y box and Power in the X box.

From the Bivariate Fit... pull-down menu select Fit Line to fit the simple linear regression model for these data. This process will fit the following model to our data.

Note: here n = 20 the total number of observations.

The estimate regression line is

137.62 + 2.527

Summary Statistics and Tests Results for the Fitted Simple Linear Regression Model

The residual plot clearly shows that our model is deficient, as there is an obvious nonlinear pattern in the residuals. Basically, this plot suggests that we have fit a line to a curvilinear relationship. To obtain this residual plot select Plot Residuals from the Linear Fit pull-down menu located below the scatterplot.

A better model for the etch rate as a function of the power setting may be quadratic in power level, i.e.

To fit the quadratic model

in JMP select Fit Polynomial > 2, quadratic from the Bivariate Fit pull-down menu. The results of this fit are shown below.

Summary Statistics and Test Results for Quadratic Model

Example 2 - Analysis of Covariance – Monofilament Fiber Strength
(Example 15-5 pg. 578)
Data File: Fiber-Stength.JMP

The variables in this data file are:

· Machine – machine used make fiber (the effect of interest)

· Diameter – diameter of fiber (covariate, varies from sample to sample)

· Breaking Strength – response (lbs.)

To visualize the analysis of covariance process select Fit Y by X from the Analyze menu and place Diameter in the X box and Breaking Strength in the Y box. This will create a scatter plot of tensile strength vs. glue thickness. To color code the points according to the machine that produced the fiber select Color/Marker by Col... from the Rows menu, highlight Machine, and the click OK. Then select Grouping Variable... from the Bivariate Fit... pull-down menu. Highlight Machine as the variable to group by in the window that opens. Finally select Fit Line from the Fitting menu. This will add a separate linear fit for breaking strength vs. fiber diameter for each machine. The resulting plot is shown below.

There does not appear to be any major differences between the machines, however machine 2 looks like it might produce stronger fibers than the rest. Keep in mind that there are very few replicates for each machine so differences will have to be “large” to be statistically significant! Clearly fiber diameter is related to breaking strength.

To perform a formal analysis of covariance of the machine and diameter effects we need to use the Fit Model approach. To do this in JMP select Fit Model from the Analyze menu and place Breaking Strength in the Y box and both Machine and Diameter in the Effects in Model box. The relevant computer output is shown below.

The Effect Test output from the running the “unrelated lines” model is shown below. From the p-value for the Machine*Diameter interaction (p = .6293) we see that a parallel line model would be sufficient for these data.

Fitting the parallel line ANCOVA model we have the following output.

The p-value for diameter (p < .0001) indicates obvious significance. This should not be surprising given the obvious relationship between breaking strength and fiber diameter exhibited in the scatterplot above. There is little separation between the machine-dependent regression lines suggesting that the machine effect is not large. The p-value for machine is .1181 providing very weak evidence of a machine differences. Keep in mind the number of replicates is small so the observed differences would have to be “large” before they would be detected as significant. With enough replicates we would eventually conclude that there are machine differences, but given what we see here it is unlikely they are large enough to be of practical interest.

As with any analysis were a model is fit you should examine the residuals to check for any assumption violations. A plot of the residuals vs. the fitted values from this model is shown below, with the exception of the one case with a small predicted breaking strength nothing stands out as unusual here.

To assess normality first select Save Studentized Residual (i.e., standardized residuals) from Response Breaking Stength > Save Columns pull-out menu and then use Distribution to examine their distribution via a histogram and normal quantile plot.

The residuals appear to be approximately normally distributed.

Note: The ANCOVA models

This model stated simply says that...

Observed response = overall mean + treatment effect + covariate effect + random error

For separate slopes model we have,

ß notice the slope now has treatment subscript