Evaluating Quasi-Experimental Designs

Applications ExercisesResearch in Psych, 7e: Study Guide, Chapter 1010-1

Evaluating Quasi-Experimental Designs

For each of the following descriptions of quasi-experiments, construct a graph from the data and then evaluate the conclusion drawn by the researcher. That is, determine whether the conclusion is justified by the results or if an alternative explanation for the outcome is possible. Also, identify any methodological problems that ought to be corrected.

1.To test the effectiveness of an employee incentive plan on employee morale, a program evaluator used a nonequivalent control group design in a company that manufactured electric trains. The incentive plan created five-person work groups that set productivity goals, and bonuses occurred when the groups met certain goals. The unit that was given the incentive program was made up of workers who had been with the company for at least 10 years (and therefore familiar with company operations). The control group was a unit that was randomly selected from other units in the organization. Before and after the plan was implemented, productivity (number of trains produced per day) was measured. The evaluator pronounced the program a success after these results occurred:

program group pretest = 25

program group posttest = 41

control group pretest = 21

control group posttest = 38

2. To evaluate the effect of special program to improve the spelling of second graders weak in spelling, a researcher completed a nonequivalent control group design. On the basis of teacher recommendations, 20 students known to be poor spellers were identified and placed in an experimental group. Another 20 students were randomly selected from second grade classes and placed in a control group. The pretest and posttest measures of spelling ability were lists of 40 spoken words that had to be written down by the students. After getting these results, the researcher recommended implementing the spelling program for all poor spelling second graders statewide:

pretest spelling program group: 18

posttest spelling program group: 24

pretest control group: 35

posttest control group: 36

3.Alarmed at the large number of children being injured in car crashes during the first six months of 1997, state X debates a tough child restraint law during the fall of 1997. The law passes and goes into effect on January 1 of 1998. Injury rates are recorded every six months (January and June for the prior six months). Several years later, a time series analysis evaluates the effect of the new law – the dependent variable is the number of children aged 1-10 per thousand injured in car accidents. On the basis of the following data, the researcher concludes that the program did not have a significant effect:

January 1996269January 1998293

June 1996239June 1998217

January 1997280January 1999225

June 1997358June 1999212

4.In an effort to improve the driving performance of its home delivery pizza drivers, a pizza chain with stores in Illinois, Indiana, and Ohio decides to implement a safety training program, with incentives to encourage safe driving. The training program occurs for the Indiana stores in December and for the Ohio stores the following May (Ohio is a switching replication). The Illinois stores serve as a control group. Unknown to drivers, a “safety” score is calculated for each driver once a month; it is a combination scores based on unobtrusive observations of their seat belt use and their speed 100 yards away from the store. A low score is good and a high score is bad. The company concludes that the training program is a success, based on the following overall scores for drivers in each state (scores recorded every month, beginning six months prior to the first training program in Indiana):

Indiana:41, 46, 39, 38, 38, 41, 39, 22, 28, 25, 29, 35, 41, 40

Ohio:36, 35, 38, 33, 33, 28, 18, 10, 10, 11, 13, 09, 13, 10

Illinois:45, 43, 41, 43, 40, 38, 37, 26, 27, 29, 31, 33, 39, 33

Answers

1. There are two problems with the conclusion drawn. First, although productivity did increase for the program group, it also increased for the control group, and by about the same amount. Thus, whatever might have produced the increase for the control group could also have produced the increase for the program group. For example, perhaps all of the productivity numbers are relatively low and the changes in both cases are the result of regression. Perhaps some event (i.e., history) intervened between pre- and posttest and the result was an increase in productivity. Second, the professed purpose of the study was to see if the program would improve morale, yet the behavior measured was productivity, which may or may not be related to morale – that is, the researcher used the wrong dependent variable.

Concerning the design of the study, there doesn’t seem to be good justification for selecting veteran workers for the program group. It would have been better to try to reduce the nonequivalence of the groups by picking groups that were similar in terms of levels of experience.

2. The program group certainly showed an increase in spelling performance, while the control group showed no improvement. There are problems of interpretation, however, making it impossible to conclude with any certainty that “the program improved spelling.” Because the pretest score is so low in the program group, their apparent improvement could be a regression effect, at least in part. Second, it is likely that the results for the control group reflect a ceiling effect – their pretest score was so high that significant improvement was virtually impossible. Thus, to say that the control group showed “no improvement” distorts the true outcome.

Concerning the design of the study, this is a case in which an experimental design might be feasible – randomly assign poor spellers to a program group and a waiting list control group. If a nonequivalent control group design must be used, it would have been better to screen out really good spellers in the control group and try to populate the control group with “average” spellers. An ideal nonequivalent control group design would compare two groups of poor spellers, perhaps from two similar school districts.

3. The researcher probably suspects a regression effect and that certainly seems to be operating to some extent here. Note the decline from June of 97 to January of 98—it happened before the law actually went into effect. The continuing decline for the following six months could also be partly due to regression, but it is possible that a true program effects occurred as well. Statistical analysis would have to confirm it, but it looks like the last three data points are lower than the first three points.

Concerning the design of the study, comparisons with a similar state might help to separate program effects from regression effects.

4. It isn’t clear if this program is a success or not. Performance improves after the program is put into effect in Indiana in December, but there is a similar dip for Illinois, the control state. It is likely that some kind of seasonal trend exists, with scores being lower in the winter months. Perhaps the winter weather forces more careful driving. Scores increase (and driving performance deteriorates) as the weather improves in both states, further evidence of a seasonal trend. On the other hand, performance improves even in the summer for the replication state of Ohio. Ohio shows the same seasonal trend as Indiana and Illinois, but the improvement after program implementation cannot be a seasonal trend. The fact that driving is better in general for Ohio pizza drivers suggests a subject selection effect is operating. Maybe the Ohio drivers, better to begin with, were more open to further training.

Concerning the design of the study, it might have been better to implement the training programs for the two experimental groups closer together in time (e.g., May for one, July for another). That way the immediate post-program tests would occur during similar weather.

- 1 -