Evaluating public health interventions with non-experimental data analysis
HEDG
CHE Research Conference
May 2006
Abstract
Policymakers and public health researchers have demanded better evidence of the effects of interventions on health and health inequalities. This point was made in the recent Wanless (2004) report, which spoke of the “…almost complete lack of an evidence base on the cost-effectiveness of public health interventions.” This can be extended to the lack of evidence on effectiveness itself, despite the relative wealth of documented observational evidence describing inequalities in health. Clearly, evidence on the prevalence and causes of health inequalities may not be informative with regards to the best methods to reduce these inequalities.
While evaluation of the outcomes of public health interventions (in their widest sense) are uncommon, and leave policymakers with little robust, up-to-date information about effectiveness, there are problems with conducting evaluative research in the area of public health. In particular, little evidence can be gained from experimental data, since many of the social determinants of health and health inequalities are not amenable to randomisation, due to political, broader ethical or even practical reasons.
Natural experiments and observational studies have been put forth as a heretofore un-or under-exploited area of evaluative research in public health. Natural experiments, for example, offer the means to evaluate both the gains in health and the distributional impact on health brought about by a given intervention. Analysis of data derived from natural experiment, or observational evidence from non-experimental data, is itself problematic, though: as well as problems identifying the “treatment effect”, constructing the counterfactual, the potential multiplicity of outcomes of interest, and the extent of externalities may need to be considered when evaluating the impact of public health interventions.
This paper will outline problems associated with the analysis of non-experimental data in the area of public health intervention research, and offers potential solutions to these problems by drawing upon the ‘toolkit’ of the applied micro-econometrician. Special regard will be given to the necessity of considering reductions in health inequalities, as well as the existence of externalities and the complexity with which they imbue evaluative analysis.
Introduction
There is a wealth of research evidence on the extent of socio-economic determinants of health and inequalities in health, varying in expression across income and wealth, social class and educational attainment, as well as across societies: developed, underdeveloped, market-let, welfarist and even communist (Macintyre 2003). Such information, however useful for monitoring health and health inequalities, or establishing benchmarked targets, does not always inform the best response to either maximise health gains or reduce inequalities in health. The association between smoking and cancer, or heavy-drinking and liver cirrhosis, for example, does not in itself direct an effective policy instrument: the plausibility of an intervention is no guarantee of its success. A recent review in England noted the relative dearth of research into interventions related to health inequalities, compared with research that describes those inequalities (HM Treasury, Department of Health, 2002). This is particularly the case in public health research, in which expertise is dedicated more to description of health inequalities and their causes, than evaluating actual or potential interventions aimed at their amelioration. This was echoed to some extent in the Wanless report, which cites an “…almost complete lack of an evidence base on the cost-effectiveness of public health interventions,” (Wanless, 2004) and points more widely to the lack of evidence (not solely on cost-effectiveness but also effectiveness) in health care policy and practice. The Public Health Sciences Group, convened by the Wellcome Trust, supported this view by referring to the public health evidence base as ‘weak’ (see Wilkinson 1999, Petticrew et al 2005).
This thin evidence base has rendered policymakers largely uninformed about what constitutes a successful intervention, or how best to implement one, in large areas of the public health agenda. This has generated demand for better evidence of the effects of interventions on health and health inequalities (Macintyre 2003, Mackenbach 2003, Petticrew et al 2005). Undertaking evaluative, rather than descriptive, research however is by no means straightforward.
Obstacles to intervention studies in public health
By design, many public health interventions do not lend themselves to rigorous and robust evaluation. While randomised controlled trials have become the gold-standard for evaluative research in clinical studies, for public health research, the random allocation of individuals (or groups) to intervention and control groups is often foregone. This may be for a variety of reasons: political, ethical or practical. For example, randomisation of communities in receipt of the ‘Sure Start’ programme was deemed unacceptable politically (Macintyre, 2003). Such concerns can, however, be accommodated. For example, one can imagine a study design whereby rather than establishing control and treatment groups for an intervention, an entire area receives the intervention but in a phased manner, allowing analysis of the variation in outcomes according to the intensity of the intervention over time. Further, to ensure a sufficient coverage of communities, allocation to treatment and control group may be more acceptable across individuals within communities. The challenge for the applied public health researcher is, therefore, to design interventions studies that are amenable to analysis, or be creative in the approach to the analysis where the lack of randomisation and possibly poor study design is such that standard techniques of comparison across intervention and control group are simply not applicable.
Reducing inequalities in health is often the desired outcome of public health initiatives and accordingly, study design should be such that differential effects on, or trends among, different socio-economic groups can be analysed. Information on aggregate health gains alone may not be of most concern and many studies currently do not report results stratified by relevant socioeconomic groups. This may be due to lack of statistical power or a lack of interest in such analyses. Public health interventions should be designed to facilitate systematic evaluation of both the effectiveness (and cost-effectiveness) of the intervention overall and with specific regard to groups of the population that are at greater risk of experiencing poor health outcomes.
Randomised controlled trials typically represent best available evidence: in public health research they are thought to represent best unavailable evidence. They can be problematic to implement in practice, being practically and/or politically unfeasible. Instead, natural experiments – in which the intervention of interest is itself the experiment – may represent the best available evidence for public health research.
Natural experiments and non-experimental data
The gaps in the evaluative evidence base for public health may potentially be filled, at least partially, by exploiting the opportunities offered by so-called ‘natural’ experiments. Petticrew etal (2005) describes a natural experiment as one that “…usually takes the form of an observational study in which the researcher cannot control or withhold the allocation of an intervention to particular areas and communities, but where natural or predetermined variation in allocation occurs.” Hence a natural experiment takes the policy reform itself as the experiment and attempts to construct the counterfactual from a naturally occurring comparison group.
The term can apply to interventions intended to reduce health inequalities, as well as to other interventions in which changes in health are not the intended outcome, but nevertheless constitute spillover effects. Natural experiments encompass community, area and regional based interventions together with individual-level based interventions.
While natural experiments form an important component of evaluative evidence in other areas of social science such as for labour market programmes (or a discussion, see for example Blundell and Costa-Dias 2000), they are less common in public health research. However, given the paucity of experimental evidence, natural experiments offer a fruitful alternative to obtaining reliable evidence on both health and inequality impacts of interventions.
However, while non-experimental data gleaned from natural experiments promise to offer valuable empirical evidence they pose difficult questions to the applied researcher wishing to analyse such data. Some of the important obstacles are outlined below.
Problems with natural experiments
Natural experiments in public health may not lend themselves to straightforward evaluation. For example, it is unlikely that a single intervention on a well-defined population for whom allocation to intervention and control group has been well constructed and where outcomes of interest are easily defined and measured will be the norm. More complex interventions are likely which may contain elements of the problems set out below. These serve to complicate indentification of the treatment effect and call upon more sophisticated analytical skills to be employed.
Non-comparability of intervention and control group
Due to a lack of randomisation, control and intervention groups in natural experiments tend to differ at baseline in ways that are often related to the outcome of interest. Moreover, these differences may be observable or unobservable and may relate to individuals, communities or areas over which the intervention is applied. Health status, educational attainment or levels of wealth or deprivation, for example, are related to health and are measurable. Social capital however is less well defined and hence may remain unobserved, but may also be related to health outcomes. Unless baseline differences such as these can be accounted for adequately, evaluative studies will produce biased estimates of the effects of an intervention.
This is of particular concern for public health interventions, where concerns over inequalities in health would suggest initiatives are targeted to specific social classes or geographic areas. This may cause problems where individuals or communities self-select into (or are otherwise endogenously selected) an intervention group.
Multiple interventions
An initiative may involve more than a single intervention. Consider for example area regeneration, which may include improvements to housing, transport infrastructure, the provision of childcare, education and employment opportunities, increasing green spaces, etc. Any or all of these are likely to have an impact on health, directly or indirectly, and disentangling the effect of a single one is not straightforward. This is particularly the case if one or more of these is concurrent. In the UK, interventions such as the New Deal for Communities, Health Living Centres, Social Inclusion partnerships, Health Action Zones, Education Action Zones and many more housing and area regeneration projects have overlapped geographically and temporally (Petticrew et al 2005). Forming effective control groups and identifying intervention-specific effects in such circumstances is fraught with difficulties.
Identifying affected individuals
Unlike clinical trials, in which individuals are specifically allocated to one or other arm of a trial, natural experiments and public health interventions often involve more diffuse groups of individuals, making definition of the relevant affected population difficult. Bans on smoking in public places, for example, intervene directly on smokers, but have affects on workers and visitors within these spaces. They may even affect a smoker directly if their consumption declines. The health benefits of such bans may not easily be evaluated unless affected individuals can be identified and their health status recorded.
Spillovers and externalities.
Spillover effects or the externalities of an intervention may constitute an important outcome for a public health evaluation. Spillovers are benefits (but could include negative consequences) to individuals not directly targeted by an intervention. For example, an immunization programme may reduce the incidence of disease in an intervention group, but may also lower the incidence of disease among individuals in contact with the intervention group but not themselves directly inoculated.
Spillover or externalities may be extended to benefits, or costs, that are enjoyed outwith the health benefits that a particular intervention aims to achieve. For example, increasing public health (through whatever means) may have benefits to the wider economy or to socially desirable outcomes such as education. Avoiding days of work lost through sickness, or early retirement through ill-health may have positive productivity effects, avoiding absence from schooling increases educational achievement.
Conversely, initiatives outside the health sector may have important and perhaps unintentional spillovers in terms of health benefits. The provision the roads, improvements in housing will all have knock-on effects (good and bad) for individuals health.
Evaluating public health interventions must be done in a manner that captures such spillover or externalities.
Multiple outcomes
Linked to external effects, an intervention may have been designed with more than a single outcome in mind. In such circumstances, how should multiple outcomes be considered? Should they be evaluated jointly? Should equal weight be afforded to each outcome of interest? If externalities consist of non-health benefits should these carry equal weight to the direct health benefits from an intervention?
An example of both spillovers and multiple outcomes, and their consideration in evaluative studies, is in the so-called ‘worms’ paper by Miguel and Kremer (2005). They consider a de-worming programme undertaken in Kenya, taking advantage of the fact that only some students in selected areas were de-wormed, in an initiative with phased implementation, which allowed the authors to analysis the variation in outcomes during the programme’s existence. Their findings that school-based health interventions affect school attendance, thence educational attainment, social outcomes and development overall is considered an important development in the analysis of non-experimental data. The authors describe how previous experimental studies showed weak evidence of a positive causal link between children’s de-worming treatment and physical growth and educational outcomes, leading to an influential review published in the British Medical Journal(Dickson et al, 2000) recommend that countries should not invest in mass de-worming programmes. The conjunction of randomisation at school-level (as opposed to individual-level) with the application of rigorous non-experimental estimation approaches provided Miguel and Kremer with results that challenge those published in the BMJ.
Although randomisation across schools made it possible to identify both the overall programme effect and cross-school externalities experimentally, it was necessary to rely on non-experimental methods to decompose the effect on treated schools into a direct effect and a within-school externality effect. It became clear that in previous studies the differences in health and educational outcomes had been understated due to the presence of local treatment externalities. Improved health and school participation, for example, were identified among both treated and untreated children, thus masking any improvements obtained through comparing treatment and control groups. According to the authors’ estimates, the benefits of the programme for the overall economy more then justify fully subsidised treatment, therefore contradicting policy recommendations based on experimental assessments of the programme’s impact and cost-effectiveness ratios.
Time horizon
The health benefits from a public health intervention may take a number of years before they become apparent and observable. This may inhibit the ability of natural experiments to identify a positive treatment effect even where one exists. Common solutions to this problem are to take measurements of proxies for health improvements. An obvious example would be to measure the number of people who quit smoking or the decrease in consumption following a smoking intervention. This would appear a very reasonable proxy for the health benefits enjoyed by a reduction in smoking. Similarly spillover effects may suffer from the same issues. The `worms’ example cited above (Miguel and Kremer, 2005) proxies educational attainment and subsequent life-cycle earnings and social capital effects by measuring improvements in the number of days absence from schooling. Other outcomes, however, may not necessarily be so easily proxied in the short term and a longer time horizon for the collection of data may be required (or perhaps an extrapolation of short-term outcomes to longer time horizon). Alternatively, greater refinement of methods for measuring health benefits may be required to detect improvements within what may be considered a reasonable time horizon following an intervention.
Low response rates
In order to affect health inequalities many public health interventions necessarily target more deprived groups. However, obtaining reliable response rates in such areas is often difficult and responses are more susceptible to attrition during longitudinal studies (Parry et al 2001). It has been postulated that this is due to general social disengagement and mistrust in scientific research or government-led programmes. Public health interventions may further suffer from difficulty in recruiting successfully to control groups, where, unlike randomised controlled trials, individuals know they are not receiving the intervention.
Low response rates and/or attrition can lead to bias in evaluative research due to the introduction of unobserved differences in the characteristics of the individuals or groups involved, especially if those differences are associated with the outcomes of interest.
Further bias may be introduced should new entrants to a study fail to be representative of original sample members. For example, gentrification, following area regeneration, may result in people moving into an area who are likely not to represent pre-regeneration residents.
Generalisability of research/results
As well as potentially poor internal validity, large social programmes may suffer poor external validity: it will not always be plausible to assume that the observed effect of an intervention in one area will replicate itself over the same intervention in other areas. The benefits may well prove specific to the context and/or location. Multi-site evaluations may be required, in order to capture a generalisable measure of the effectiveness of the intervention of interest.
Advantages of natural experiments
Many of the major social determinants of health and health inequalities are not particularly amenable to randomisation or control. Thus, whatever the issues involved with the analysis of natural experiments and non-experimental data, the principal advantage of the natural experiment is that it exists. It is, as mentioned, the best available evidence for public health research. They represent the best opportunity to develop evidence on the effectiveness of interventions, at least in the short term.