<!DOCTYPE HTML PUBLIC "-//SQ//DTD HTML 2.0 HoTMetaL + extensions//EN">
Copyright 1995. Russell Sage Working papers have not been reviewed by the Foundation. Copies of working papers are available from the author, and may not be reproduced without permission from the author.
Preferred Citation: Robinson G. Hollister and Jennifer Hill, "Problems in the Evaluation of Community Wide Initiatives," (New York: Russell Sage Foundation) 1995 [
Problems in the Evaluation of Community-Wide Initiatives
Robinson G. Hollister and Jennifer Hill
RUSSELL SAGE FOUNDATION, Working Paper # 70
A Paper prepared for the Roundtable on Comprehensive Community Initiatives
April, 1995
Introduction
In this paper we outline the types of problems which can arise when an attempt is made to evaluate the effects of community-wide programs. We partially review experience with different methods where available. In general we find the problems are substantial, so in a concluding section we provide some suggestions for steps which might be taken to improve methods of evaluation which could be used in these situations[1].
We face several definitional problems at the outset. What do we mean by community-wide initiatives? What type of effects are we particularly interested in measuring? Finally, what are the major objectives of the evaluation?
To help with definitions, we turn to papers produced by members of the Roundtable committee.
Community-wide Initiatives
P/PV has defined community as: "...the intersection of place and associational network. Community encompasses both where youth spend their time and whom they spend it with"[2].
Brown and Richman describe "urban change initiatives" as sharing "to some extent the following guiding principles or development assumptions:
1.Community Building: neighborhood development should 'rebuild the social fabric' of the neighborhood.
2.Community Empowerment: neighborhood development should involve residents and other neighborhood stakeholders in the process of identifying and prioritizing problems and designing solutions to these problems.
3.Comprehensive Community Change: neighborhood development should adopt a holistic and integrative perspective on neighborhood dynamics and change." [3]
Under the title of community-wide initiatives, in this paper we intend to focus on programmatic interventions which treat groups of individuals as a whole. That is, all the individuals in a given geographic area or in a given class of people are eligible for the program's intervention or are potential targets of that intervention. It is the inclusiveness of eligibility that is central to the concept of community-wide initiatives. We emphasize this feature at the outset because we will be distinguishing sharply those situations in which it would be possible to use random assignment methods for individuals in the evaluation to create control groups from those in which the character of the eligibility of the intervention precludes use of such methods.
Some brief examples may help. In the late 1970's and early 1980's the federal government funded a program called Youth Incentive Entitlement Pilot Project (YIEPP). This program selected a number of school catchment areas in several states. All the low income persons between the ages of 16 and 19 within that area were eligible to participate in the program. The program provided them with work opportunities during the summer time - guaranteed jobs essentially - and part time work during the school year. If they took the job in the summer, they were to continue in school during the school year. A major objective was to encourage school continuation by making employment possible for the low income population. The key feature is the inclusiveness that dictates against random assignment; since all of the low income youth in a given school catchment area were eligible, random assignment was not possible.
A different type of example is that of community development corporations (CDCs) which confine their efforts for community change to geographically designated areas and, at least in theory, all the residents of those areas are potentially eligible for services provided through the community development corporations efforts.
Effects to be Measured
Turning to the definition of effects which the evaluations assess, we focus for the most part on longer term outcomes which are said to be the concern of the community-wide initiative. We want to separate the longer term outcomes from the more immediate short term changes that are often covered under what is called a process analysis. Thus in the YIEPP example the long term outcomes of interest were school continuation rates of the youth and their employment and earnings. The participation of the youth in the program, while it was of some interest, was not itself considered a long term outcome of central interest. Rather, it was a process effect.
In the case of the community development long term outcomes might be an improvement in the quality of the housing stock in the designated area or an increase in the number of jobs in the designated area held by people residing in that designated area, while a process outcome might be participation in community boards which make decisions about how to allocate the program resources.
It should be recognized, of course, for what is considered as a "process variable" for some purposes, may be considered an outcome variable for others, e.g., participation of community members in decision-making could be regarded as part of a process leading to a program outcome of improved youth school performance in one situation but could be an "empowerment" outcome valued in its own right in another situation. A clear delineation of the theory of the intervention process would specify which are "process" and which are "outcome" effects.
The Counterfactual.
The basic question an evaluation seeks to address is whether the activities consciously undertaken which constitute the community-wide initiative generated a change in the outcomes of interest. In order to address the central evaluation issue the problem in this case, as in virtually all evaluation cases, is to establish what would have happened in the absence of the program initiative. This is often referred to as the counterfactual. Indeed most of our discussion will turn around a review of alternative methods that have been tried in order to establish a counterfactual for a given type of program intervention.
To those who have not steeped themselves in this type of evaluation, it often appears that this is a trivial problem. Simple solutions are proposed. For example, let's look at what the situation was before the initiative and what the situation is after the initiative in the given community. The counterfactual is the situation before the initiative. Or let's look at this community and find another community that initially was very much like it and then see how after the program initiative the two communities compare on the outcome measures. That will tell us the effects of the program. The comparison community will provide the counterfactual - what would have happened in the absence of the program.
As we shall see however, and as most of us know, these simple solutions are not adequate to the problem - primarily because individuals and communities are changing all the time with respect to the measured outcome even in the absence of any intentional intervention. Therefore, measures of the situation before the initiative or with comparison communities are not secure counterfactuals; they may not represent well what the community would have looked like in the absence of the program.
Let's return again to some concrete examples. YIEPP pursued a strategy of pairing communities in order to develop the counterfactual. For example, the Baltimore school district was paired with Cleveland. The Cincinnati school district was paired with a school district in Louisville, etc. In making the pairs the researchers sought to choose communities that had labor market conditions similar to those of the treatment community.
A similar procedure, with a great deal more detailed analysis, was adopted as part of an on-going study of school dropout programs currently being conducted by Mathematica Policy Research. The school districts with the dropout program were matched in statistical detail with school districts in the near neighborhood, that is within the same city or SMSA (Standard Metropolitan Statistical Area).
In both of these examples, even though the initial match seemed to be quite good, circumstances evolved in ways that made the comparison areas doubtful counterfactuals. In the case of YIEPP, for example, Cleveland had unexpectedly favorable improvement in its labor market compared to Baltimore. Louisville had disruption of its school system because of court ordered school desegregation and busing. This led the investigators to discount some of the results from using these comparison cities. In the case of the school drop out study, though the districts matched well in terms of detailed school and population demographics at the initial point, a couple of years later when surveys had been done of the students and teachers in the respective school districts it was found that in terms of the actual processes of the schools, the match was often very bad indeed. The schools simply were operating quite differently in the pre-program period and had different effects on students and teachers.
Random Assignment as the Standard for Judgement
For quantitative evaluators random assignment designs are a bit like the nectar of the Gods: once you've had a taste of the pure stuff it is hard to settle for the flawed alternatives. In what follows, we often use the random assignment design - in which individuals or units which are potential candidates for the intervention are randomly assigned to be in the treatment group, which is subject to the intervention, or to the control which is not subject to any special intervention. (Of course, random assignment does not have to be to a null treatment for the controls; there can be random assignment to different levels of treatment or to alternative modes of treatment). The key benefit of a random assignment design is that, as soon as the number of subjects gets reasonably large, there is a very low probability that any given characteristic of the subjects will be more concentrated in the treatment group than in the control group. Most important, this hold for unmeasured characteristics as well as measured characteristics Thus when we compare average outcomes for treatments and controls we can have a high degree of confidence that the difference is related to the treatment and not to some characteristic of the subjects. The control group provides a secure counterfactual as, aside from the treatment, the control group members are subject to the same forces which might affect the outcome as are those in the treatment group: they grow older just as treatment group members do, they face the same changes in the risks of unemployment or increase in returns to their skills, they are subject to the same broad social forces that influence marriage and family practices.
We realize that this standard is very difficult, often impossible, for evaluations of community-wide initiatives to meet. But we use it in order to obtain reliable indications of the type and magnitude of errors which can occur when this best design is not feasible[4]. Unfortunately, there appear to be no clear guidelines for selecting second-best approaches but a recognition of the character of the problems may help set us on a path to developing such guidelines.
The Nature of the Unit of Analysis
For most of the programs that have been rigorously analyzed by quantitative methods to date, the principle subject of program intervention has been the individual. When we turn to community-wide initiatives, however, the target of the program and the unit of analysis usually shifts away from just individuals to one of several possible alternatives. The first, with which we already have some experience, is where the target of the program is still individuals but it is individuals within geographically bounded areas. While the individuals are still the targets of the intervention the fact that they are to be defined as being within a geographically bounded unit is intentional because it is expected that interactions among individuals or changes in the general context will generate different responses to the program intervention than would treatment of isolated individuals.
Another possible unit of analysis is families. We have had some experience with programs in which families are the targets for intervention and where the proper unit of analysis remains the families rather than sets of individuals analyzed independently of their family unit. This would, of course, be the case with for example family support programs. These become community-wide initiatives when the set of families to be considered are defined as within geographically bounded areas and eligibility for the program intervention somehow relates to those geographical boundaries. Many of the recent community-wide interventions seem to have this type of focus, a focus on families within geographically bounded areas.
Another possibility for community initiative is where the target and unit of analysis are institutions rather than individuals. Thus within a geographically bounded area an attempt might be made to have a program which targets particular sets of institutions, the schools, the police, the voluntary agencies, the health providers and to generate changes in the behavior of those institutions per se. Then the institution becomes the relevant unit of analysis.
The reason for stressing the importance of being clear about the unit of analysis is that it can make considerable differences in the basic requirements for the statistical analysis used in the evaluation. Quantitative analyses focus on the frequency distribution of the outcome and we use our statistical theory in order to make probabilistic statements about the particular outcomes that we observe. The theory is based on the idea that a particular process has generated outcomes that have a random element in them. The process does not generate the same result every time but rather a frequency distribution of outcome values. When we are using these statistical methods to evaluate the impact of programs we are asking whether the frequency distribution of the outcome has shifted because of the effect of the program. Thus a statistically significant difference in an outcome associated with a program is a statement that the outcome we observe from the units subject to the program intervention has a very low probability of coming from a distribution which is the same as the distribution of that outcome for the counterfactual group. So if the community, in some sense, is the unit of analysis and we're looking at, for example, the incidence of low birth weight children in the community, then we need to have information about the frequency distribution across communities of the percentage of low birth weight babies.
The unit of analysis becomes critical because of the ability to make these probability statements about effects using statistical theory depends on the size of the samples. So if the community is the unit of analysis then the sample size will be the number of communities in our samples. If the court systems are the unit of analysis and we're asking about changes in incarceration rates generated by court systems and we're changing courts in one community in some way and not in the other, then we want to know about the frequency distribution across different court systems of incarceration rates and the size of the sample would be the number of such systems that are observed.
The Problem of Boundaries
When we're talking about community-wide initiatives we're often talking about cases where geographical boundaries define the unit or units of analysis. Of course the term community need not imply specific geographic boundaries. Rather it might have to do with, for example, social networks. What constitutes the community may vary depending upon what type of program process or what type of outcome we are talking about. The community for commercial transactions may be quite different from the community for social transactions. The boundaries of impact for one set of institutions, let us say the police, may be quite different from the boundaries for impacts of another set of institutions, let us say schools or healthcare networks.
We will not attempt here of full discussion of how boundaries of communities or neighborhoods might be defined[5]. We quote some insights which illustrate the complexity of the issue of community or neighborhood boundaries: "..differentiated subareas of the city are recognized and recognizable...neighborhoods are perhaps best seen as open systems, connected with and subject to the influence of other systems... individuals are members of several of these systems at once...delineation of boundaries is a product of individual cognition, collective perceptions, and organized attempts to codify boundaries to serve political or instrumental aims... local community may be seen as a set of (imperfectly) nested neighborhoods...recognition of a neighborhood identity and the presence of a 'sense of community' seems to have clear value for (1) supporting residents' acknowledgement of collective circumstances and (2)providing a basis and motivation for collective action...neighborhoods are experienced differently by different populations [and}...are used differently by different populations"[6].