Towards a European perspective on the use and usability of evaluation outputs

Murray Saunders

“A review of evaluations of these programmes in different Member

States reveals their potential importance. At a basic level, the

information generated by evaluations of these programmes can help

the Commission and MemberStates monitor their impact. This, in turn,

can inform debates at EU and national levels on the merits and future of

Cohesion policy. Used effectively, evaluation offers a range of specific

benefits: stronger public sector planning; more efficient deployment of

resources; improved management and implementation of programmes

or other policy interventions; stronger ownership and partnership

amongst actors with a stake in programmes; greater understanding of

the factors determining the success of programmes; and broader scope

to assess the value and costs of interventions.”(Ferryand Olejniczak, 6)

Introduction

This quote from a recent report by Ernst & Young on the way evaluations were used in the context of cohesion policy programmes in Europe (taking Poland as a country case study) embodies the high expectations of evaluations. At issue is their capacity to yield resources for managing developments in policy and practice. In short, evaluations ought to offer much by way of responsive policy development but this resource is often underused or its potential remains opaque. The Ernst and Young report (Ferry and Olejniczak, 2008) is a serious attempt to grapple with the issue of evaluation use in the European context and heralds a move to engage with anissue which has a substantial American literature but has hitherto been undeveloped in Europe.

This paper addresses the development of this thinking by contributing a framework that might be useful in building in a systematic consideration of use into the design of evaluations in the European policy context which has to some extent been a ‘Cinderella’ area of consideration. While it seems axiomatic that evaluations should be used, a planned and rehearsed approach to the way this may happen is rarely undertaken as part of design brief or proposal. Usually the discussion does not go beyond a description of the way in which the output from an evaluation might be disseminated.

This paper introduces the distinction between the use of evaluations on the one hand (the way the organisational and systemic context of an evaluation has the capacity to use evaluations effectively) and the usability of evaluations on the other (the extent to which the design of an evaluation maximises or facilitates its potential use).

Of course we have the seminal contribution to this debate offered by successive editions of ‘utilization focused evaluation’ by M. Q. Patton (1997) in which he asserts that the potential for use is largely determined a long time before ‘a study is completed’, he argues that ‘utilization-focused evaluation emphasises that what happens from the very beginning of a study will determine its eventual impact long before a final report is produced’ (Patton 1997, 20).

For Patton, an evaluation which is focused on use, ‘concerns how real people in the real world apply evaluation findings and experience the evaluation process . . . it is about intended use by intended users’ (Patton 1997: 20). He acknowledges that in an evaluative environment there are likely to be an array of evaluation users as well as uses of programme outputs, but he argues for a particular emphasis on primary intended users and their commitment to specific concrete uses. Further he argues for the highly situated nature of these kinds of decisions, which, broadly is the stance taken in this paper. While Patton offers us an emphatic approach to the problem of use, other commentators have stressed the complexities of grappling with what we mean by use. For example, Fleischer and Christie (2009), in a useful overview of the, predominantly, American literature on use, demonstrate the different emphases and preoccupations.

The literature on evaluation use identifies many potentially relevant factors, but is equivocal about which are most related to increasing evaluation use. Commentators have asserted however that the phenomenon of ‘‘use’’ is multifaceted (Ledermann 2011). Also relevant is the assertion from Hofsetter and Alkin (2003) with reference to the US environment that

“for as long as modern-day evaluations have existed, so too have questions about evaluation use. As with social science research and knowledge, there remain ongoing concerns about the level of importance of conducting evaluation and producing information that leads to action, be it in creating policy, decision making, or changing how someone may regard a particular issue. We assume that our efforts are rewarded with some importance or action, although measuring the effects evaluation has and will continue to prove problematic.” (Hofsetter and Alkin 2003, 219)

Theoretical considerations and approach

As Fleischer and Christie (2009) note above, several important reviews of the uses of programme evaluations have been undertaken in the US evaluation literatureso I will not repeat this work but contextualise their broad conclusions within the European context. Building on their overview, it is possible to identify the following typology of use:

Figure 1 Typologies of use

1. Instrumental: when decision makers use the evaluation findings to modify the object of the evaluation in some way
2. Conceptual: when the evaluation findings help program or policy makers understand the program in a new way
3. Enlightenment: when the evaluation findings add knowledge to the field and thus may be used by anyone, not just those involved with the program or evaluation of the program
4. Process use: cognitive, behavioural, program and organizational changes resulting from engagement in the evaluation process and learning to think evaluatively.
5. Persuasive or symbolic (justificatory): persuade important stakeholders that the program or organization values accountability or when an evaluator is hired to evaluate a program to legitimize a decision that has already been made prior to the commissioning of the evaluation.

The ‘use domains’ identified by Fleischer and Christie are helpful. From my perspective, as I will suggest in more detail below, these domains can be understood as clusters of ‘use practices’. I find the following observation made by Lederman (Lederman, 2011) in her recently completed research, persuasive and agree that a different approach is needed:

The study reported in this article follows the calls in recent years that it is time to abandon theambition of finding ‘‘the important’’ characteristic for use and to adopt a focus on context-bound mechanisms of use instead (Lederman, 2011, 5)

The implication of adopting this approach is that it is important to discern the domain and the clusters of practices emphasised or inhabit that domain. In the evaluation of cohesion policy investment in the European context, for example, we can see that evaluative activity inhabits at least four ‘domains’ across policy areas. I address these domains in detail below (see Figure 2).

More generally, what does ‘practice’ (or more accurately ‘social practice’) mean in the context of use and usability?The application of social practice theory to evaluation use leads us to take a perspective which emphasises the way evaluation use is embedded in specific national and regional contexts. Without this appreciation of the contextual factors which affect use, use strategies as part of the design of an evaluation are likely to be less effective (Pawson and Tilley, 114). This means that in our perspective, evaluation use,

  • can be considered as evaluation practice in all social policy areas.
  • involves dimensions of practice consisting of symbolic structures, particular orders of meaning in particular places, and has unintended effects.
  • consists of practices which use implicit, tacit or unconscious knowledge as well as explicit knowledge.
  • can have progressive enabling characteristics but are also perceived and experienced as controlling, as part of a ‘surveillance’ culture.

I have adopted the perspective that depicting and understanding what goes on in social domains like evaluation use requires an operational definition of social practice. This perspective denotes a concern with activity, with behaviour, with what people do, what they value and what meanings they ascribe either singly, in groups, in institutions through their systems, or nationally through national managing structures. So, how to depict or understand ‘what people do’ in terms of the mediation of the messages (output) that emerge from an evaluation? What kinds of practices are embodied in the use of an evaluation output?

At its core, what people do is a social phenomenon, multi-hued of course, but we consider it to have discernable characteristics. What people do then can be termed ‘practice’, and all social life can be interpreted as consisting of a series or clusters of practices in different fields of activity, within families, friendships groups, at work and so on. So, the social practice perspective foreground ‘practices’ and focuses on the way practice itself, in this case the use and usability of evaluations, becomes an object of scrutiny. It is possible to depict social life as constellations of practices within particular domains and at the same time, cross cut by horizontal and vertical considerations associated with distributions of power and resources, gender, ethnicity, identity/biography and place. The focus of this paper is the constellation of practices associated with evaluation use.

The idea of practice is a key aspect of socio-cultural theory, and it takes as its unit of analysis social practice, instead of (for example) individual agency, individual cognition, or social structures. By social practice we mean the recurrent, usually unconsidered, sets of practices or ‘constellations’ that together constitute daily life. Moreover, a social practice viewpoint alerts us to the danger of a rational-purposive understanding of change, one which assumes that people on the ground will act in 'logical' ways to achieve well-understood goals, or that managers and policy-makers will have clear and stable goals in mind and be able to identify steps towards achieving them.

For the purposes of this paper then, practices can be usefully conceptualized as sets or clusters of behaviours forming ways of ‘thinking and doing’ associated with evaluation use. As phenomena, practices can be defined as:

‘A ‘practice’ (Praktik) is a routinized type of behaviour which consists of several elements, interconnected to one another: forms of bodily activities, forms of mental activities, ‘things’ and their use, a background knowledge in the form of understanding, know-how, states of emotion and motivational knowledge… [A] practice represents a pattern which can be filled out with a multitude of single and often unique actions reproducing the practice… The single individual – as a bodily and mental agent – then acts as the ‘carrier’ of a practice – and, in fact, of many different practices which need not be coordinated with one another. (Reckwitz, 249-250).

Working within this approach, we can see that evaluation use is essentially a social practice. It is undertaken by people, within structures of power and resource allocation. We can say that evaluation use is social practice bounded by the purpose, intention or function of using the processes and outputs of evaluation. Stern (2006: 293) has outlined the complexities of the widely shared view that evaluation is a ‘practical craft’ by identifying at least seven taken for granted assumptions about where the emphasis should be. These range from evaluation as technical practice, as judgement, as management, as polemical and as social practice. I am defining evaluation use as clusters, or ‘constellations’, of practices which form a ‘first order’ definition of which the various manifestations in his formulation are expressions.

In order to depict evaluation use within the European environment of cohesion investment, we can discern four domains of practice (for a detailed analysis of this model see Saunders et al, 2011).

Figure 2 Domains of evaluative use practice in policy sectors

National/systemic(using evaluation output from evaluations at national or sectoral level)

Programmatic(using evaluation output from the evaluations of specific interventions or programmes)

Institutional(using evaluation output from evaluations of aspects of organisational practice)

Self(using evaluation output from self evaluative practice at practitioner level)

These domains are not definitive but serve the purposes of this paper and contribute to the view by Lederman (2011)cited above. As Stern points out, these expressions may well be very different in focus and not particularly consistent; it is these differences Figure 3 below illustrates in the context of use. On the basis of an examination of two policy areas (Health and Education). It suggests how the practice clusters associated with evaluation use, differ between the domains in which evaluation practice occurs.

Figure 3 Use Practice clusters

Evaluative domain

/

Practice emphasis

/

Use practice cluster

National/systemic / Regulation / Use practice focusing on the distribution of resources and distinguishing between performance in time and place

Programmatic

/

Policy learning

/ Use practice focusing on the provision of resources for decisions on successful policy instruments and mechanisms

Institutional

/

Quality Enhancement and assurance

/ Use practices involving internal assurance, section reviews and institutional process checks

Self

/

Development Diagnosis

/ Use practices of problem identification, impression management and PR, identifying good practice and developing better practice

Use and usability in the European context

The approach depicts the growth in externally derived requirements for judgements about value in European contexts and can be understood as an emergent form of governance structure. I depict the growth of a complex set of evaluative practices, including evaluation use embedded within EU structures and occupying an increasing amount of time, energy and new expertise. The developing interest in evaluation use in Europe can be explained by the confluence of the following

  • The urge to ‘sense make’ in complex environments: evaluation outputs can tell us what is going on?
  • Social and political imperatives (issues of transparency, resources, legitimacy and equity): evaluations outputs can contribute to public debate?
  • Methodological debate: difficulties and uncertainties in addressing ‘end points’ (attribution, causality, alignment and design): evaluation outputs strive to provide authoritative evidence of successful and unsuccessful strategies for change and improvement?
  • Evaluations cost time and money: moving away from evaluations as ritual, or as compliance toward evaluations as ‘use objects’?

The Ernst & Young Report (2008) is based on a study which proceeded by addressing five propositions about the factors which might influence effective use within the paradigm identified above, and is a systematic and thorough overview of a range of methodological, paradigmatic and strategic considerations. What is helpful for the purposes of this paper are the propositions the report contains about the characteristics which determine ‘use practice’ summarised below.

  • the characteristics of the learner or receiver of the evaluation
  • the characteristics of the policy being evaluated
  • the timing of evaluations.
  • the approach taken to evaluation
  • the quality of the report

I suggest that these factors are conflating a distinction which is important for a use based evaluation design. In the discourse offered in this paper, the Ernst & Young Report demonstrates the existence of practice clusters which can be further understood by distinguishing between‘use’ and ‘usability’.

In order to make ‘really useful’ evaluations happen, the social practice approach suggests we need to take into account the existing social milieu into which an evaluative output might be placed in our designs for use or, put another way, our engagement strategies. In summary, there are some further distinctions we might make which help this discussion to centre on the difference between use on the one hand and usability on the other. These categories refer to the interaction between the organisational environment into which an evaluation output might be intervening and the design of the evaluation output itself. Both these features interact to determine the extent to which an evaluation output or an evaluation process can create effects.

Use: refersto the practices associatedwith the way in which the outputs of an evaluation are used as a resource for onward practice, policy or decision making.

The extent to which an evaluationis used depends on the capacity of the potential users to respond to the messages an evaluation might contain. The following characteristics of this ‘use environment’ are drawn fromcasestudies of evaluation use (see Saunders M., et al 2011) and demonstrate the factors that seem particularly pertinent in terms of maximising use, confirming those in the Ernst and Young study.

• The timing and nature of the ‘release’ of the evaluation output is embedded in decision making cycles (this requires clear knowledge on when decisions take place and who makes them).

• The evaluators and commissioners have a clear understanding of the organisational or sectoral memory and are able to locate the evaluation within an accumulation of evaluative knowledge.

• The evaluation has reflexive knowledge of the capacity of an organisation or commissioning body to respond. This requires the following dimensions:

The evaluation output connects effectively with systemic processes. This means the messages are able to feed into structures that are able to identify and act on implications.

Organisations that are lightly bureaucratised, sometimes called adaptive systems, are better placed to respond to ‘tricky’ or awkward evaluations because their practices are more fluid, less mechanistic and have a ‘history’ of responding to new knowledge.

Evaluations that are strongly connected to power structures are more likely to have effects because they have champions who have a stake in using evaluations to change practices.

Evaluation outputs that identify or imply possible changes that are congruent or build on what is already in place have a greater chance of maximising use.

Usability: refers to the way an evaluation design and its practices shapes the extent to which outputs can be used.

Prosaically, most research into evaluation usability refers to the form the evaluation outputs take and the extent to which they ‘speak’ to the intended user in an appropriate way. This refers to the design of the vehicle of the message to maximise engagement (a single, cold unresponsive text, a presentation, working in a workshop, working alongside a user to draw out practice based implications etc.) but also the way in which the design of the evaluation itself lends itself to communicability. What are the situated practices associated with the cases we have included that illuminate issues associated with use?In a recent case study of evaluation use in the context of Arts within higher education, Alison Shreeve (Shreeve 2011) outlined a high usability approach to evaluation based on a highly interactive and user responsive design. Interestingly, the learning points she identifies have considerable resonance with the design issues prompted by adopting the practice based approach. Although at the ‘high risk’ end of a strategy to maximise usability, her observations are telling.