Rethinking the teacher’s role in assessment

Paper presented by Wynne Harlen

at the BERA Annual Conference, 2004, as part of the symposium: Assessment for Learning: Where from? Where next?

Introduction

This paper draws on the findings from systematic reviews of research relating to teachers’ roles in summative assessment, which have been instigated by the Assessment Reform Group[1]. The relevance to this symposium stems from the findings on how summative assessment impinges negatively on formative assessment practices and the potential of reducing this effect by giving teachers a greater role in summative assessment The first of three relevant reviews (Harlen and Deakin Crick, 2002; 2003) concerned the impact of summative assessment and tests on students’ motivation for learning. Its findings, outlined in this paper, can be summed up as painting a very negative picture of the impact of tests on students’ motivation for learning. The findings also pointed to positive action that could prevent these negative effects. Many of these actions, including teachers promoting learning goal orientation rather than performance goal orientation, developing students’ self-assessment skills and providing feedback to students that is non-judgmental, are key features of formative assessment (assessment for learning). However, there is a good deal of evidence that high stakes summative assessment (assessment of learning) makes demands on teachers and students that conflict with the goals and practice of assessment for learning.

One way to avoid this conflict is to give teachers more responsibility for summative assessment in a way that enables teachers to make use of some of the information they gather and use to help learning in assessing learning outcomes. Moreover, teachers are able to assess achievement related to a wider range of learning goals, including processes as well as projects of learning, and so avoid the narrowing impact on the curriculum and teaching methods of summative assessment by tests. But issues of the dependability and of potential bias in teachers’ judgements have to be considered if giving teachers a greater role in summative assessment is to be taken seriously. Therefore this paper summarises outcomes from a second review (Harlen, 2004), of the evidence of the reliability and validity of assessment by teachers used for summative purposes. The findings are brought to bear in identifying the range of actions that would enable teachers to have a greater role in assessment of learning and preserve opportunities for assessment for learning. Reference is also made to a third review, just completed and not yet published, which has explored the evidence for conditions that enable the potential benefits of using teachers’ assessment for summative purposes to be realised. In the final section of the paper, the policy implications of these research findings are considered.

Purposes and uses of assessment

Assessment for learning has the purpose of informing teaching and learning, whilst summative assessment has the purpose of reporting on learning achieved at a certain time. Summative assessment, however, has more than one use, for there is a variety of ways in which the information about student achievement at a certain time is used. These include: internal school tracking of students’ progress; informing parents, students and the students’ next teacher of what has been achieved; certification or accreditation of learning by an external body; selection; monitoring the school’s performance; accountability of teachers and school. These can be grouped into two main uses – ‘internal’ and ‘external’ to the school community. Internal uses include using regular grading for record keeping, informing decisions about courses to follow where there are options within the school, reporting to parents and to the students themselves. External uses include certification by examination bodies or for vocational qualifications, selection for further or higher education, monitoring the school’s performance and school accountability. (We are not considering here with national monitoring where only a sample of schools and students is involved in a survey.)

The concern here is mainly with the impact of assessment for external uses, since these generally acquire ‘high stakes’, that is, are associated with the status of the school and in some cases directly with its financial support or with the salaries of individual teachers. It is in such circumstances that summative assessment can acquire a stranglehold on what is taught and how it is taught. However, internal uses are not entirely free from high stakes, for parents can exert pressure where they are not satisfied with results, and results for classes or departments can be set against each other during internal school evaluation. Moreover, research shows that where external assessment has high stakes the effect is for internal assessment to emulate external tests and examinations, increasing the impact on students (Pollard et al, 2000). This has the further effect of driving out the practice of using assessment for learning. With these effects in mind, the next sections of this paper looks at the evidence of impact of high stakes tests on students and what can be done to reduce their negative effects.

The impact of external tests: claims and counter-claims

The impact of summative assessment on students’ achievement, on teaching and on the curriculum has been well researched and reviewed by Brookhart (2004), Crooks (1988), Linn et al (1982), Shepard (1991) and Stiggins (1999). The potentially positive role of assessment in focusing teaching and learning on educational goals is not realised in practice when goals are defined by external tests associated with high stakes. Increase in test scores is not the same as increase in achievement; research into testing programmes shows that increase in scores is due to greater familiarity of teachers and students with the tests rather than increase in real learning. High stakes tests are inevitably designed to be as ‘objective’ as possible since there is a premium on reliable marking in the interests of fairness. Thus has the effect of reducing what is assessed to what can be readily and reliably marked. Generally this excludes many worthwhile outcomes of education such as problem solving and critical thinking. However, even when tests-makers do try to include higher level thinking skills, the pressure on teachers of the high stakes attached to test scores can lead to training for the test, to the extent that students can pass any kinds of test, even those intended to assess higher cognitive skills, when they do not possess these skills (Gordon and Reese, 1997). Teachers under pressure to reach goals expressed in terms of increase in test scores tend to focus their teaching on what is required in the tests, spend time on practice tests and, often unconsciously, value test performance rather than genuine learning. Worse, there can be cheating, stealing test papers and teaching the answers, or changing students’ answers (Black, 1993; Shepard, 2003).

The negative impact of test anxiety on student achievement is well known, but the suspicion that testing has a much wider detrimental effect on students has grown with the rapid increase in the incidence of testing. The growth of testing in the US has been documented by Clarke et al (2000) and there has been a similar inflation in testing in England (Professional Association of Teachers, 2002). A considerable volume of anecdotal evidence has accumulated from parents as well as teachers, that testing is having an impact on students’ enjoyment of, and willingness to continue, learning. These are important outcomes of education; for we have reached a stage where school education can itself no longer provide students with skills and knowledge to last for their life-time. Within a decade there will be the requirement for skills that we do not even recognise today (just as most of us would not have known what ‘surfing the net’ meant a decade ago). Thus school must prepare students for continued learning throughout life and this means willingness to learn, enjoyment of learning, the development of high-level skills and of understanding how to learn.

It was to provide some firm evidence in relation to the impact on students’ motivation for learning that a review of relevant research, reported in the next section, was carried out.

Research evidence of the impact of tests on motivation for learning

The review of research on the impact of summative assessment and tests was a systematic review conducted using the procedures developed by, and partly funded by, the EPPI-Centre (Evidence for Policy and Practice Information and Co-ordinating Centre)[2]. These procedures lead to the selection, extraction and synthesis of data from the most relevant and methodologically sound studies from a larger number found from as a wide a search of the literature as possible.

Motivation is a complex concept, embracing several aspects that relate to learning, such as self-esteem, self-regulation, interest, effort, self-efficacy, a person’s sense of themselves as a learner. An important feature is goal orientation; that is, whether the learner is oriented towards learning to understand or towards performing well on a test. People who commit themselves to a goal will direct their attention towards actions that help them to attain that goal and away from other actions. Research indicates that students who embrace learning goals (also known as task involved or mastery goals) show more evidence of superior learning strategies, have a higher sense of competence as learners, show greater interest in school work and have more positive attitudes to school than do students who embrace performance (achievement or ego-involving) goals (Ames, 1990; Dweck, 1992). An orientation towards either learning goals or performance goals is related to the distinction between intrinsic motivation (finding satisfaction in the learning) and extrinsic motivation (engaging in learning to achieve a reward or avoid a penalty).

Main findings

None of the research studies found in the review dealt with all of the aspects of motivation, but they could be grouped according to the outcomes they investigated. Some of the findings are now briefly outlined. The full report by Harlen and Deakin Crick (2002) is available on the website given in the references.

There was evidence from several studies that an impact of testing on those who do not do well is to lower their self-esteem. For example, studies of children aged 7 before the introduction of National Tests in England and Wales, following the 1988 Education Reform Act, showed no correlation between self-esteem and achievement, indicating the lower achieving children could have the same level of self-esteem as their higher achieving peers. After the introduction of National Testing, however, there was a positive correlation, indicating that the self-esteem of the lower achieving students was lower than that of the higher achievers. (Davies and Brember, 1998, 1999). There is, of course, no basis for suggesting that the national tests were a direct cause of the change in correlations; indeed the impact of testing is rarely direct but mediated through a variety of circumstances and people influencing children’s affective responses to tests. However this study does point to the introduction of the tests as the main factor which differed for the cohorts of students concerned, whatever the mechanism of its impact.

A similar finding was reported by Reay and Wiliam (1999) from observing and interviewing 11 year olds at the time of the National Tests in England. Repeated practice tests made some students all too well aware of what they could achieve and this led to very low views of their own capabilities. The researchers found a class climate in which the tests became the rationale for all that was done and the levels they expected to achieve were the criteria by which students were judged and judged themselves. Studies of the impact of the 11+ tests in Northern Ireland (Leonard and Davey, 2001) also reported the devastating impact of the tests on the self-esteem of those who did not match up to their own or others’ expectations.

Studies of the impact of state mandated tests in the USA have also reported the lowering of self-esteem of lower achieving students as an impact of the tests. (Gordon and Rees, 1997; Paris et al, 1991).

A study by Johnston and McLune (2000) conducted in Northern Ireland, indicated how the impact of high stakes tests on teachers’ teaching style can affect students’ feeling of themselves as learners. These researchers used several instruments to measure students’ learning dispositions, self-esteem, locus of control and attitude to science and related these to the transfer grades obtained by the students in the 11+ examination. From the measures of how students preferred to learn in science they grouped students according to their learning dispositions. These showed a strong preference for learning through first-hand exploration and problem-solving. The researchers also observed in classrooms to identify the teaching style of the teachers. They found a high proportion of teaching through highly structured activities and transmission of information and very little opportunity for students to learn ‘hands-on’. In interviews the teachers indicated that they felt constrained to teach in this way on account of the nature of the tests. This meant that many students were not able to learn in the way they were disposed to learn and as a result felt inadequate and demoralised as learners.

The feeling of self-efficacy, the extent to which learners judge themselves as capable of succeeding in a particular task, is related to feedback from earlier work of a similar kind. Learners’ judgements of their work are based on criteria communicated implicitly or explicitly and used by their teachers. Feedback that focuses on how to improve or build on what has been done is associated with strengthening the feeling of being capable of what is required and understanding how to improve. Feedback that emphasises, through marks or grades, comparison with others, encourages a focus on how to get better grades rather than better understanding (Butler, 1988). Research shows that self-efficacy is related to the effort that students put into work that offers a challenge. There is also evidence that, beyond the practices of individual teachers, the general atmosphere of encouragement in the school and the informal culture of expectations built up over the years is important to students’ feelings of self-efficacy (Duckworth et al, 1986).