1

Translating the Evidence

A brief guide to the Early Intervention Foundation’s procedures for identifying, assessing and disseminating information about early intervention programmes and their evidence

Overview

The Early Intervention Foundation (EIF) was established in July 2013 to champion and support the effective use of early intervention.By ‘early’ we mean activities that support children’s development at all ages, to keep problems from occurring in the first place. By ‘intervention’ we mean programmes and practices that target the needs of children and families who have an identified risk of negative life outcomes that may also carry a long-term social cost.

EIF is one of six UK ‘What Works’ centres aiming to make the existing evidence more accessible and ultimately improving its strength. A key role of EIF’s assessment function, and an aspect of its mandate as a What Works centre, is to find, collect and disseminate information about early interventions in terms of their evidence. This is because evaluation evidence is considered by many to be the best method forunderstanding the extent to which a programme is effective. We recognise, however, that from a commissioning perspective, programme evidence is just one part of a much bigger picture. In order for an intervention to ‘work’ at any given time or place, commissioners also need information about a programme’slocal costs, likely benefits and their requirements for implementation (see Figure 1).

Figure 1: The three dimensions of strategic commissioning

  • Strength of Evidence provides insight to whether a programme ‘works’ or has the potential for working. This covers ratings of cost and effectiveness.
  • Implementability considers the extent to which the programme’s implementation requirements are specified so that its benefits can be replicated with various populations and host agencies and the degree to which a commissioning agency has the resources, capabilities and means to implement the programme effectively.
  • Cost benefit analysis. The fact that a programme has been effective does not mean that it is necessarily the best programme to commission. Commissioners also need to have a strong sense of population need, and to have appraised the likely local benefits of the programme for the commissioning agency and the wider population for which it is responsible.

1

Information about the extent to which a programme supports social outcomes (beyond child development) is also crucial, especially for understanding the intervention’s potential for reducing social costs and providing value for money. Outcomes that are likely to be of particular interest to commissioners include those linked to reductions in youth crime and antisocial behaviour, alcohol and substance misuse, risky sexual behaviour, mental health problems, school failure, child maltreatment and adult obesity (see Figure 2).Interventions addressing these outcomes may involve working with the child directly, or through their environments, including their families and schools. For the purposes of this review, the focus is on interventions that improve the quality of relationships between couples, thereby improving outcomes for children, i.e.by impacting on the family and the home, achieving benefits for child development that re likely to translate into improved life chances.

Figure 2: Outcomes for improving children’s life chances and reducing social costs

It is important to note that the tables presented below pertains to a programme’s strength of evidence in terms of improving child outcomes. However, for the purposes of this review, programmes will be assessed in terms of evidence of impact on either parent and/or child outcomes, recognising that achievement of an impact on the couple relationship may indicate a likely longer term impact on child outcomes even where that is not evidenced.

This paper provides an overview of the processes underpinning these three steps, so that commissioners and providers have a transparent understanding of our approach and methods.

Step 1: Identification

‘What Works’ Reviews

As a What Works centre, EIF is asked to systematically collect evidence on interventions and practices and synthesise it in a clear and accurate way to inform the decision making of policymakers and local commissioners. Systematic literature reviews make use of systematic methods to search for, select and assess evidence within a predetermined topic area. When done correctly, systematic reviews are widely perceived as an objective and transparent method for assessing of the quality of the available evidence. They are limited, however, to information that is published and accessible through scientific databases, meaningthat the most recent, unpublished, or non-peer-reviewedinformation abouta programme or practice might be missed. Systematic reviews also tend to be biased towards positive findings, since negative or non-conclusive results frequently go unreported.

To limit these biases, EIF has adopted a three-pronged approach to identifying interventions that may be of interest to UK commissioners and policy makers. This approach combines traditional literaturesearch methods with web-based searches (including websites, government reports and other clearinghouses) and a call for evidence, involving our places and other organisations with an interest in developing interventions for children and families.

Figure 3: A three-pronged approach for identifying early interventions

  1. Each search begins with a predefined set of criteria which is agreed by the steering group responsible for the specific study.
  2. The criteria are then used to search relevant databases, websites and identify organisations for further contact.
  3. Once programmes have been identified, they are assessed as eligible for further scrutiny against the original search criteria.
  4. The details of specific programmes are then described in an initial report providing an overview of the kind of interventions available within a specific area of interest.

It is worth noting that the primary aim of most systematic reviews is to determine the overall strength of evidence within a given topic area. This differs from EIF’s approach, which assesses the evidence underpinning individual programmes and practices. EIF accomplishes this through a panel review process that carefully scrutinises all of a programme’s evidence and assignsa rating to its overall strength. This rating is not intended as a kite mark or validation of a programme’s quality, nor a guarantee that the programme will work at any given time or place. Rather, it is meant as an objective assessment of the extent to which an intervention could work under the appropriate conditions. The rationale underpinning our rating system is described in further depth in the following section.

Step 2: Assessment

The panel review process

Once interventions have been identified through traditional systematic search methods, they are assessed against EIF’s standards of evidence through a panel review process involving the following steps:

  1. A second web-based search is conducted to identify all of the published evaluation evidence on each programme. A list of evidence is produced for each programme and shared with the providers to determine whether any evidence has been missed.
  2. The evaluations for each programme are rank-ordered in terms of the strength of their design and undergo an initial assessment against the EIF Standards of Evidence (see below). This work is completed by highly trained researchers working within the EIF evidence team.
  3. The initial assessments and evaluation reports for each programme are then forwarded to an external expert who also reviews the evidence underpinning each programme. External experts are invited to the panel on the basis of their expertise within specific areas of interest. A minimum of five reviewers participate on each panel.
  4. A panel meeting takes place where the evidence team and external expert discuss together the strength of evidence underpinning a set of interventions and agree an initial evidence rating for each programme reviewed.This rating is primarily informed by the intervention’s most robust evidence.
  5. Once initial ratings have been agreed for all of the programmes identified within a review, amoderation meeting involving a wider group of experts takes place to further debate and agree a final assessment rating.

EIF Standards of Evidence

As a What Works Centre, EIF is expected to assess interventions in terms of their effectiveness (i.e. do they make a difference?) and impact (i.e. how much of a difference do they make?) and cost. These assessments are determined through the careful scrutiny of the intervention’s evaluation evidence, which includes an assessment of the quality of the evaluation design(s) and the extent to which the findings suggest consistent and meaningful benefitsfor children[1]. EIF accomplishes this by assessing an intervention’s evidence against a well-established set of standards that are broadly agreed across the What Works Network. These standards emphasise the value ofrandomized controlled trials (RCTs) and similarly rigorous quasi-experimental designs (QEDs) over qualitative studies and expert opinion.This is because qualitative designs and expert opinions cannot determine causality or scale of impact, although it is recognised that these methods canadd valuable insight into how and why an intervention might work.

The EIF standards make useof six discreteratings, starting from a rating of ‘negative’ and then numerically running from 0 to 4 (see Table 1). This strength of evidence scale is broadly similar to theNESTA evidence standardswith the addition of 0 being assigned to interventions that are not based on any specified theory or evaluation evidence, and the negative rating assigned to programmes for which there is strong and consistent evidence that the approach is harmful, or provides no observable benefits to children or families. These standards were developed and approved in consultation with EIF’s evidence panel made up of distinguished academics with specific expertise in programme evaluation and children’s development.

Table 1: The Early Intervention Foundation’s Evidence Standards
Features of the evidence/rationale / Description of evidence / Description of programme / EIF rating
Multiple high-quality evaluations (RCT/QED) with consistently positive impact across populations and environments / Established / Consistently Effective / 4
Single high-quality evaluation (RCT/QED) / Initial / Effective / 3
Lower-quality RCT/QED or pre/post evaluation suggesting improved child outcomes / Formative / Potentially Effective / 2
Logic model with testable features, but no current evidence of improved child outcomes / Non-existent / Theory-based / 1
Evidence from at least one high-quality evaluation (RCT/QED) indicating null or negative impact / Unspecified / 0
Programmes not yet rated, including those rated by evidence bodies whose standards are not yet mapped to the EIF standards, and submissions from providers or local areas of innovative or promising interventions / Negative / Ineffective/harmful / -

Level 3: Attributing causality

Level 3 (as opposed to Level 0 or 4) is a good starting point for understanding EIF’s evidence standards, as Level 3 evaluations are specifically designed to determine whether an intervention can be linked to specific child outcomes – i.e. the extent to which a causal relationshipexists between children’s exposure to the intervention and statistically significant changes in their behaviours and feelings.Impact evaluations accomplish this by comparing the outcomes of individuals who participated in the intervention (the treatment group)with the outcomes of those who did not (a comparison group). A key feature of reliable impact evaluations is that participants in both groups areequivalent in all respects except that one group receives the treatment and the other does not.

Randomised Controlled trials (RCTs) have earned the reputation as the ‘gold standard’ for measuring impact because they provide a robust method of creating equivalent treatment and comparison groups. This is accomplished through the use of random assignment, which theoretically ensures that systematic biases that might otherwise existbetween groups are randomly distributedacross them. In this respect, RCTs help maintain a study’sinternal validity – i.e. the extent to which systematic biases are reduced or eliminated (see Box A).

While random assignment provides a straightforward method for reducing the riskof between-group biases, it often fails when put into practice. This is because it can be difficult to recruit a sufficiently large sample within a relatively short time period. It is also not uncommon for treatment and comparison group participants to drop out of the study at different rates. Small sample sizes and high levels of differential attrition can create biases that artificially inflate or deflate the estimated effect, as well as make the effects less representative.

Random assignmentcan also never fully guarantee that all potential biases between the treatment and control group have been entirely eliminated. First, it generally does not cause the two groups to be exactly identical in terms of characteristics that can be observed. Second, unobserved or unknown biases may occur for seemingly ‘random’ reasons and features of the study can also introduce bias. For example, it is frequently not possible to conceal from participants their assignment to the intervention or comparison group. Knowledge of group assignmentincreases the likelihood ofunintendedplacebo effects, which can also substantiallyconfound the interpretation of the findings.

For these reasons, rigorousevaluations include methods for determiningparticipant equivalencebetweenthe treatment and comparison groups at various points throughout the study. If the sample is large enough and the data are sufficiently rich, statistical corrections may help to ‘control’ for differences that may occur during the course of the trial. If study attrition is too high, however, or other confounds are introduced (e.g. missing data, biased measurement) the trial will have failed in its ability to verify the extent to which the intervention is associated with an observed effect. It is therefore possible for a poorly conducted RCT to offer less value than a well-conducted quasi-experimental design (QED) that includes methods for addressing issues of bias. Examples of such QEDs include propensity score matching, regression discontinuity designs, and repeated single-case studies. There is a growing consensus that these kinds of QEDs, when conducted to a high standard,provide a reasonable alternative to RCTs for attributing causalityin instances when random assignment is not practical or feasible.

Level 4: Establishing external validity

While positive findings from a single, rigorously conducted RCT provide initial evidence of an intervention’s effectiveness, they give no indication as to whether these findings can be replicated. This is because the first RCT is usually conducted under ideal circumstances thatcannot be perfectly replicated in real world settings. These ideal circumstances include delivery by the original developers, high levels of practitioner skill, and systems that make sure that the intervention’s target population has been correctly selected and is receiving the a sufficient dose. From this perspective, a single RCT does not fully differentiate between outcomes related to the intervention model vs. outcomes related to the intervention’s delivery. Thus, further rigorous testing is required to establish an intervention’sexternal validity, i.e. the extent to which positive outcomes are likely to be replicated in diverse settings with diverse populations.

Even whenexternal validity has been established through positive findings from multiple rigorous studies, an attribution of causality isconsidered tentative, at best. This is because retrospective evidence is never proof that an intervention will ‘work’ at in a future location or time.Intervention effectiveness is also dependent upon the quality of implementation, meaning that robust systemswhich capture information about an intervention as it is being implemented are also required. At the very least, this information should include data about whether the intervention is being implemented as it should be (e.g.. fidelity monitoring), who it is working with, and whether it is achieving its intended short and long term outcomes.

Level 2: Understanding an intervention’spotential through formative evidence

While rigorous RCTs and QEDs are necessary for understanding an intervention’s impact, they are expensive in terms of time and money. They are also potentially unethical if there is no indication that the intervention could potentially add value. Therefore, it is wise to first investigate an intervention’s potential for effectiveness through less expensive and rigorous evaluation designs. These designs include those comparing outcomes between theintervention and a carefully matched comparison group, statistically controlled cross-sectional designs and statistically controlled large scale longitudinal studies.Good initial evidence can also be obtained from studies that make use of norm-referenced measures to compare intervention outcomes with population-based averages, in the absence of a comparison group.

Formative evaluations (also referred to as pilot studies) provide a good starting point for determining whether a positive effect of an intervention is even possible or feasible. It should be kept in mind, however, that positive findings obtained through formativeevaluations are never sufficient for attributing causality to the intervention. This is why the term formative is applied to Level 2 evaluation designs, since they are best used to inform the development of an intervention, rather than assess its impact. While positive findings from a formative evaluation may suggestthat an intervention is promising, they are no guarantee that a programme will continue to demonstrate effectiveness once more rigorous testing occurs. More often than not, promising findings observed in initial pilot studies are not fully replicated in more rigorous RCTs or QEDs.

Level 1: Articulating the intervention’s logic model

Negative findings from any study are not necessarily an indication that an intervention does not work, although they typically suggest that more refinement to the intervention model is required. This refinement may involve greater specification of the intervention’s target population (in terms of the children’s ages and level of need), its dose (i.e. amount or length of the intervention), and the skills and qualifications of the practitioners delivering it. Further refinement is only possible if the programme is informed by a logic model that has identified the ways in which its resources and core activities specifically contribute to the intervention’s intended outcomesin a way that can be systematically manipulated and tested.