Solution - HW3 Supplementary Problem on Assumption Violation
(Problem is here: )
Run the Oneway procedure getting descriptives, HOV test, and plot of means, and examine the output.
1.What assumption is violated and why do you say so?
Levene's test of HOV is significant at p = .014 indicating violation of that assumption.
Also, there is a pattern of the standard deviations increasing with the means. This is undesirable.
Run the Means procedure to get the variances.
2.What is the ratio of the largest variance to the smallest variance? What is the rule of thumb in this situation? What is your conclusion?
The ratio is 41.67/1.67 = 24.95 or about 25 to 1. We would be concerned if the ratio was 4 to 1 or greater, so this is certainly a clear signal of an HOV problem.
Run the Examine procedure (Analyze>Descriptives Statistics> Explore) to get boxplots for each group and the spread vs. level plot of the untransformed data. As we have not really covered this procedure, here is the syntax for you to paste and run.
- What pattern can be seen in the boxplots related to means/medians and group variability?
As shown by the boxplots, variability increases drastically in group 3. That group also has the highest mean.
- Note in the Spread vs. Level plot, the Spread (vertical axis) is related to variability and the Level (horizontal axis) is related to central tendency for each group. Use Spss Help Index, entering the keyword “SPREADLEVEL’ and find out what the actual measures of spread and level are. Give these measures.
SPREADLEVEL(n) / Spread-versus-level plot with the Test of Homogeneity of Variance table. If the keyword appears alone, the natural logs of the interquartile ranges are plotted against the natural logs of the medians for all cells.
- What pattern of you see in the Spread vs. Level Plot? What general pattern is desired and why?
The S-L plot confirms a substantial increase in spread with increase in level.
The desired pattern is essentially a flat line, indicating no increase in variability with increasing mean values.
6.What transformation would be a good candidate to try to improve the problems in the data (and why)?
The square root transformation should be good to start with because it is good for stabilizing variances, especially when larger variances are seen with larger means.
A transformed score was computed as follows.
*Note- 1 is added because the score “1” would not be changed otherwise.
COMPUTE sqrtscor = SQRT(score + 1) .
EXECUTE .
Apply the transformation and rerun, getting the anova, the ratio of variance, boxplot, HoV test, and spread vs. level plot.
7.Based on examination of the transformed scores, what, if any progress has been made in addressing the problems in the data? Explain and illustrate.
For sqrtsco, the ratio of the largest to smallest variance is now .588/.056 = 9.8 or about 10. There is much improvement but this is still outside our rule of thumb of 4 to 1.
The S-L plot slope is much reduced. Noting the scale on the spread (vertical) axis, there really is not that much increase in spread for group 3.
Levene’s test is no longer significant, indicating that the assumption is no longer violated. Note, with the small sample size, Levene’s test is not very sensitive.
8.Also, run the analysis of the transformed data using the UNIANOVA procedure to get the spread vs. level plot it produces. What is different?
I like the S-L plot provided by the Unianova procedure because the spread measure is s (stdev) and s^2 (variance) in the two plots provided.
Attention still needs to be paid to the automatic scaling of the vertical axix. This makes the small differences in variances look larger than it actually is.
All syntax:
*Assumes data have already been entered with varnames grp, score.
ONEWAY
score BY grp
/STATISTICS DESCRIPTIVES HOMOGENEITY
/PLOT MEANS.
/MISSING ANALYSIS .
MEANS
TABLES=score BY grp
/CELLS MEAN COUNT VAR .
EXAMINE
VARIABLES=score BY grp
/PLOT BOXPLOT SPREADLEVEL(1)
/COMPARE GROUP
/STATISTICS NONE
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
COMPUTE sqrtscor = SQRT(score + 1) .
EXECUTE .
MEANS
TABLES=sqrtscor BY grp
/CELLS MEAN COUNT VAR .
EXAMINE
VARIABLES=sqrtscor BY grp
/PLOT BOXPLOT SPREADLEVEL(1)
/COMPARE GROUP
/STATISTICS NONE
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
ONEWAY
sqrtscor BY grp
/STATISTICS DESCRIPTIVES HOMOGENEITY
/PLOT MEANS
/MISSING ANALYSIS .
GRAPH
/ERRORBAR( CI 95 )=score BY grp .
EXAMINE
VARIABLES=sqrtscor BY grp
/PLOT BOXPLOT SPREADLEVEL(1)
/COMPARE GROUP /STATISTICS NONE
/NOTOTAL.
UNIANOVA
sqrtscor BY grp
/PLOT = SPREADLEVEL
/DESIGN = grp .