The Study Design and Implementation Assessment Device (Study DIAD)
The Study DIAD is a hierarchical and multidimensionalinstrument for rating the quality of research that employed the logic of experimentation to assess the nature of a causal relationship. The instrument is hierarchical in the sense that it is comprised of questions at three levels of abstraction: four very abstract questions about study design and implementation that we term the Global Questions, eight somewhat less abstract questions we term the Composite Questions, and finally 32 or 33 (depending on the study design used) more specific Design and Implementation Questions. The Design and Implementation Questions are nested within Composite Questions, and the Composite Questions are nested in the Global Questions. See Valentine & Cooper (2007) Figures 1 and 2 for a graphic representation of these relationships.
The instrument is multidimensional in the sense that each of the global questions addresses a separate aspect of validity (e.g., (Shadish, Cook, & Campbell, 2002). For example, oneGlobal Question addresses internal validity (“Did the research design permit an unambiguous conclusion about the intervention’s effectiveness?”), another addresses construct validity, and so on. As such, the Study DIAD results in a profile of either four or eight ratings that describe the quality of the rated study’s research design and implementation.
Before rating a study, the individuals intending to apply the Study DIAD must answer several questions that contextualize the relevant aspects of study design and implementation for the current research problem. For example, when assessing the extent to which a study allowed for a “fair” comparison between the intervention and comparison groups, Question 2.1.1a (for quasi-experimental designs) asks “Were adequate equating procedures used to recreate the selection model?”The important variables which should be matched or controlled will vary depending on the context of the research question; those employing the Study DIAD need to first decide what those variables should be. The specific contextual questions are given below, and an example of how these might be answered (for research on the topic of homework) is given in Valentine & Cooper (2007), Table 1.
To apply the Study DIAD, individuals judging research quality first need to answer all of the specific Design and Implementation Questions. An algorithm is used to determine how the answers to these questions “roll up” to answer the eight Composite Questions, and how the answers to the Composite Questions roll up to answer the four Global Questions. As an example of how the algorithms work, consider Global Question 2 (“Did the research design permit an unambiguous conclusion about the intervention’s effectiveness?”). To answer this question, individuals using the Study DIAD would first need to complete the Design and Implementation questions associated with Composite Questions 2.1 and 2.2. Assume that a study reported that random assignment was used to place participants into groups, and that the comparison group lost 25% of the participants originally assigned to that condition while the treatment condition lost no participants (for an overall loss of 12.5% of participants). Also assume that the researchers decided a priori (as part of the contextual questions that need to be answered before applying the Study DIAD) that more than 20% difference in attrition rates across groups (and 15% overall) are problematic. Finally, assume that the study participants were drawn from the same local pool, that the intervention conditions were unknown by all except for the researchers themselves, and that the researchers suspected no local history or other potential contaminants.
Composite Questions 2.1 and 2.2 would be answered as below (answers are in bold font):
1
Composite Question 2.1: Were the participants in the group receiving the intervention comparable to the participants in the comparison group?
Response Pattern (Read down to determine the answer to the question.)2.1.1 Was random assignment used to place participants into conditions? (If no, answer one of the next two questions) / Yes / Yes / Yes / No / No / No
2.1.1a For Quasi-experiments: Were adequate equating procedures used to recreate the selection model?
For Regression Discontinuity Designs: Could all participants have received the intervention had the cutoff point been set differently? / n/a / n/a / n/a / Yes / No / Yes
2.1.2 Was there differential attrition between intervention and comparison groups? / No / Yes / No / No / Yes or No / Yes
2.1.3 Was there severe attrition overall? / No / No / Yes / No / Yes or No / Yes
Answer to Composite Question 2.1 associated with this response pattern: / Yes / Maybe Yes / Maybe Yes / Maybe Yes / No / No
Note: A pattern of answers to these questions that is not specifically identified results in a “Maybe no”.
Because this study was a randomized experiment, it received as “Yes” for question 2.1.1. Because it experienced a 25 percentage point difference in attrition rates across groups, compared to the judges’ threshold value of 15 percentage point difference, the study received a “Yes” to question 2.1.2. Overall attrition was 12.5%, which is less than the threshold value of 15%, so the study received a “No” to question 2.1.3. In the above table, all “Yes” answers for Question 2.1.1 are in bold, as are all “Yes” answers for Question 2.1.2. and all “No” answers for Question 2.1.3. For Question 2.1.1a, the answer “n/a” is in bold font because that question is not applicable given that the study was a randomized experiment. The “Maybe Yes” entry in the final row of the table is also in bold, indicating that as the answer to Composite Question 2.1. This answer was selected from the only column in which all entries are in bold font.
Because there are so many potential combinations of responses across the 3 (or 4, depending on the design) questions, they cannot be displayed graphically in an easily understandable way. As such, the Note associated with the table states how response patterns that are not identified should be handled.
Composite Question 2.2. Clarity of Causal Inference: Lack of Contamination
Was the study free of events that happened at the time as the intervention that confused the intervention’s effect?
Response Pattern (Read down to determine the answer to the question.)2.2.1 Was there evidence of a local history event? / No / No / No / Yes / No / Yes
2.2.2a Were the intervention and comparison groups drawn from the same local pool? / No / Yes / Yes / Yes or No / Yes or No / Yes or No
2.2.2b If yes, were intervention conditions known to study participants, providers, data collectors, and/or other authorities (e.g., parents, teachers, case managers)? / n/a / No / Yes / Yes or No / Yes or No / Yes or No
2.2.3 Did the description of the study give any other indication of the strong plausibility of other intervention contaminants? / No / No / No / No / Yes / Yes
Answer to Composite Question 2.2 associated with this response pattern: / Yes / Yes / Maybe Yes / No / No / No
For the example study, the study authors did not identify a local history event that may have contaminated study results, (“No” for question 2.2.1). Participants were drawn from the same local pool (“Yes” for question 2.2.2a), but only the researchers knew the intervention conditions of the participants (“No” for question 2.2.2b). Finally, the researchers did not identify other potential intervention contaminants (“No” for question 2.2.3). This pattern of results yields an answer of “Yes” to Composite Question 2.2.
1
Next, with the answers to Composite Questions 2.1 and 2.2 in hand, judges applying the Study DIAD can use these to answer Global Question 2. Because Composite Question 2.1 was answered “Maybe yes” and Composite Question 2.2 was answered “Yes”, the answer to Global Question 2 is “Maybe yes.”
Response Pattern (Read down to determine the answer to the question.)2.1 Were the participants in the group receiving the intervention comparable to the participants in the comparison group? / Yes / Maybe Yes / Yes / Maybe
Yes / No
2.2 Was the study free of events that happened concurrently with the intervention that confused its effect? / Yes / Yes / Maybe Yes / Maybe
Yes / No
Answer to Global Question 2 associated with this response pattern: / Yes / Maybe Yes / Maybe Yes / Maybe
Yes / No
The Study DIAD can be applied by any number of study coders, but we believe that it is best for at least two coders to apply the ratings. These coders should work independently then attempt to resolve any disagreements in conference. Disagreements that cannot be resolved in this manner could be referred to a third party.
For more information on applying the Study DIAD, please refer to Valentine & Cooper (2007).
Contextual Questions to Answer Before Applying the Study DIAD
- What commonly shared and/or theoretically derived characteristics of the intervention should be present in its definition and implementation?
- Which of these characteristics are necessary to define interventions that “fully,” “largely,” and “somewhat” reflect commonly shared and/or theoretically derived characteristics?
- What variations in the intervention are important to examine as potential moderators of effect size?
- What important characteristics of the intervention would we need to know in order to reliably replicate it with different participants, in other settings, at other times?
- What are the important classes of outcomes?
- What classes of outcomes are needed to conclude that a reasonable range of operations and/or methods have been included and tested?
- Does the principal investigator have a minimum level of score reliability for outcomes to be considered in the review? If so, what are the specific minimum reliability coefficients for internal consistency, temporal stability, and/or inter-rater reliability (as appropriate)?
- Considering the context of this study, during what interval of time should it have been conducted to be relevant to current conditions?
- Considering the context of this study, what are the characteristics of the intended beneficiaries of the intervention?
- What are the important characteristics of participants that might be related to the intervention’s effect and must be equated if a study does not employ random assignment?
- What characteristics of subgroups of participants are important (a) to have variation on and (b) to test within a study to determine whether an intervention is effective within these groups? What levels or labels capture this variation?
- Which of these characteristics of subgroups of participants are needed to conclude that a “limited” or “reasonable range” of characteristics have been included and tested?
- What characteristics of settings are important to test within a study to determine whether an intervention is effective within these groups?
- Which of these characteristics and settings are needed to conclude that a “full”, “reasonable”, or “limited” range of variations have been tested?
- What is the appropriate interval for measuring the intervention’s effect relative to the end of the intervention?
- Considering the context of this study, for purposes of sampling, what constitutes the local pool of participants?
- If participants are drawn from the same local pool, which groups of individuals (e.g., students, teachers, parents, administrators, case workers) might have been able to interfere with the fidelity of the comparison group if they knew who was in the intervention and comparison groups?
- For research on this topic, how would you define differential attrition from the intervention and control groups?
- For research on this topic, how would you define severe overall attrition?
- For research on this topic, what constitutes a minimal sample size that would permit a sufficiently precise estimate of the effect size?
- What percentage of important statistical information (i.e., sample size, direction of effect, effect size) is needed for the results of this study to be “fully”, “largely”, and “rarely” reported?
- Considering the outcome measure and the context of this research question, what constitutes “Over-alignment” and “Under-alignment” of the intervention and outcome?
1
Composite Question 1.1Fit Between Concepts and Operations: Intervention
Were the participants treated in a way that is consistent with the definition of the intervention?
Response Pattern (Read down to determine the answer to the question.)1.1.1 To what extent does the intervention reflect commonly held or theoretically derived characteristics about what it should contain? / Fully / Largely / Fully, Largely, or Somewhat / Not at All
1.1.2 Was the intervention described at a level of detail that would allow its replication by other implementers? / Yes / Yes / Yes or No / Yes or No
1.1.3 Was there evidence that the group receiving the intervention might also have experienced a changed expectancy, novelty, and/or disruption effect not also experienced by the comparison group (or vice versa)? / No / No / Yes / Yes or No
1.1.4 Was there evidence that the intervention was implemented in a manner similar to the way it was defined? / Yes / Yes / No / Yes or No
Answer to Composite Question 1.1 associated with this response pattern: / Yes / Maybe Yes / No / No
Note: A pattern of answers to these questions that is not specifically identified results in a “Maybe no”.
Composite Question 1.2. Fit Between Concepts and Operations: Outcome measure
Were the outcomes measured in a way that is consistent with the proposed effects of the intervention?
Response Pattern (Read down columns to determine the answer to the question.)1.2.1 Do items on the outcome measure appear to represent the content of interest (i.e., have face validity)? / Yes / Yes / Yes or No
1.2.2 Were the scores on the outcome measure acceptably reliable? / Yes / No / Yes or No
1.2.3 Was the outcome measure properly aligned to the intervention condition? / Yes / Yes / No
Answer to Composite Question 1.2 associated with this response pattern: / Yes / Maybe No / No
Note: If unable to determine from the report, the answer to questions 1.2.1, 1.2.2, is “no” and 1.2.3 is “proper.”
Composite Question 2.1. Clarity of Causal Inference: Fair Comparison
Were the participants in the group receiving the intervention comparable to the participants in the comparison group?
Response Pattern (Read down to determine the answer to the question.)2.1.1 Was random assignment used to place participants into conditions? (If no, answer one of the next two questions) / Yes / Yes / Yes / No / No / No
2.1.1a For Quasi-experiments: Were adequate equating procedures used to recreate the selection model?
For Regression Discontinuity Designs: Could all participants have received the intervention had the cutoff point been set differently? / n/a / n/a / n/a / Yes / No / Yes
2.1.2 Was there differential attrition between intervention and comparison groups? / No / Yes / No / No / Yes or No / Yes
2.1.3 Was there severe attrition overall? / No / No / Yes / No / Yes or No / Yes
Answer to Composite Question 2.1 associated with this response pattern: / Yes / Maybe Yes / Maybe Yes / Maybe Yes / No / No
Note: A pattern of answers to these questions that is not specifically identified results in a “Maybe no”.
Composite Question 2.2. Clarity of Causal Inference: Lack of Contamination
Was the study free of events that happened at the time as the intervention that confused the intervention’s effect?
Response Pattern (Read down to determine the answer to the question.)2.2.1 Was there evidence of a local history event? / No / No / No / Yes / No / Yes
2.2.2a Were the intervention and comparison groups drawn from the same local pool? / No / Yes / Yes / Yes or No / Yes or No / Yes or No
2.2.2b If yes, were intervention conditions known to study participants, providers, data collectors, and/or other authorities (e.g., parents, teachers, case managers)? / n/a / No / Yes / Yes or No / Yes or No / Yes or No
2.2.3 Did the description of the study give any other indication of the strong plausibility of other intervention contaminants? / No / No / No / No / Yes / Yes
Answer to Composite Question 2.2 associated with this response pattern: / Yes / Yes / Maybe Yes / No / No / No
Composite Question 3.1. Generality of Findings: Inclusive Sampling
Did the study include variation on participants, settings, outcomes, and occasions representative of the intended beneficiaries?
Response Pattern (Read down to determine the answer to the question.)3.1.1 Did the sample contain participants with the necessary characteristics to be considered part of the target population? / Yes / Yes / No
3.1.2 To what extent did the sample capture variation among participants on important characteristics of the target population? / Fully / Fully or Reasonable Range / Fully, Reasonable Range, or Limited
3.1.3 To what extent did the study include variation on important characteristics of the target setting? / Fully / Fully or Reasonable Range / Fully, Reasonable Range, or Limited
3.1.4 To what extent were important classes of outcome measures included in the study? / Fully / Fully or Reasonable Range / Fully, Reasonable Range, or Limited
3.1.5 Did the study measure the outcome at a time appropriate for capturing the intervention's effect? / Yes / Yes / Yes or No
3.1.6 Was the study conducted during the time frame appropriate for extrapolating to current conditions? / Yes / Yes / Yes or No
Answer to Composite Question 3.1 associated with this response pattern: / Yes / Maybe Yes / No
Note: A pattern of answers to these questions that is not specifically identified results in a “Maybe no.”
Composite Question 3.2. Generality of Findings: Effects Tested within Subgroups
Was the intervention tested for its effect within important subgroups of participants, settings, and outcomes?
Response Pattern (Read down to determine the answer to the question.)3.2.1 To what extent was the intervention tested for effectiveness within important subgroups of participants? / Fully / Reasonable Range / Reasonable Range / Reasonable Range / Somewhat or Not at All
3.2.2 To what extent was the intervention tested for effectiveness within important subgroups of settings? / Fully / Fully / Reasonable Range / Reasonable Range / Somewhat or Not at All
3.2.3 Was the intervention tested for its effectiveness across important classes of outcomes? / Yes / Yes or No / Yes or No / Yes or No / No
3.2.4 Was the time of measurement (relative to the end of the intervention) tested as an influence on the intervention’s effect? / Yes / Yes or No / Yes or No / Yes or No / No
Answer to Composite Question 3.2 associated with this response pattern: / Yes / Maybe Yes / Maybe Yes / Maybe Yes / No
Note: A pattern of answers to these questions that is not specifically identified results in a “maybe no.”
Composite Question 4.1. Precision of Outcome Estimation: Effectsizes and standard errors
Were effect sizes and their standard errors accurately estimated?
Response Pattern (Read down to determine the answer to the question.)4.1.1 Was the assumption of independence met, or could dependence (including dependence arising from clustering) be accounted for in estimates of effect sizes and their standard errors? / Yes / Yes / Yes / Yes / No
4.1.2 Did the statistical properties of the data (e.g., distributional and variance assumptions, if any, presence of outliers) allow for valid estimates of the effect sizes? / Yes / No / Yes / No / Yes or No
4.1.3 Were the sample sizes adequate to provide sufficiently precise estimates of effect sizes? / Yes / Yes / No / No / Yes or No
4.1.4 Were the outcome measures sufficiently reliable to allow adequately precise estimates of the effect sizes? / Yes / Yes or No / Yes or No / Yes or No / Yes or No
Answer to Composite Question 4.1 associated with this response pattern: / Yes / Maybe Yes / Maybe
Yes / Maybe No / No
Composite Question 4.2. Precision of Outcome Estimation: Statistical Reporting