Comparison of Peer, Self and Tutor Assessment with University Teaching Staff

Comparing Peer, Self and Tutor Assessment in a course for University Teaching staff

Steve Wilson, London Metropolitan University

Paper presented at the Learning Communities and Assessment Cultures Conference organised by the EARLI Special Interest Group on Assessment and Evaluation, University of Northumbria, 28-30 August 2002

Abstract

In a University programme concerned with learning and teaching in higher education, one assessment component on one of the modules is in the form of a group presentation. The presentation is self, peer and tutor assessed using criteria devised by the presenting group and agreed by the principal module tutor (module convenor). In written feedback given to the presenting group, participants are required to comment on the match with the assessment criteria and the learning outcomes for the module (as identified by the presenting group) as well as providing other more holistic comments. Through an investigation of the last two cohorts of participants, this research considers the outcomes from using this assessment method and the impact it has had on the participants’ thinking and practice.

Initially, the study compares the assessments made by the various constituent groups (self, peer and tutor) in terms of both the formative feedback and the grade recommended. It was found that self-assessment grades are sometimes noticeably higher than the peer and tutors grades, which themselves demonstrate a greater degree of convergence. A comparison of the formative comments made by the different constituent groups also explores what was seen as relevant in determining the overall recommended mark and, especially, how participants make the connection between the assessment criteria, the use of learning outcomes in the assessment and feedback process. The prevailing wisdom is that having explicit criteria is good for both students (knowing expectations) and to staff (clarity and consistency in marking etc.). However, there can be a wide latitude in the interpretation of such criteria for markers. A key issue is thus ensuring an understanding of agreed criteria before marking commences.

Through follow-up questions and interviews with participating University staff, views are sought on the impact of the process on their educational thinking and own practice. A key question is how self and peer assessment contributes to an understanding of the assumed connectivity between the learning outcomes and the assessment criteria agreed, and the demands of the assessment itself. Also investigated is how the negotiation and agreement on such criteria and outcomes assist both the presenting group and the recipients (acting as peer assessors) in making their judgements.

The study postulates that staff new to teaching, and therefore assessment, in a University context require opportunities to “try on” assessment methods in order that their significance and potential may be realised. It also explores the value of direct experience in the use of assessment criteria and of matching assessment with learning outcomes. Indications are that engagement with the issues within an environment that “matters”, i.e. one that is linked to their success on the award, provides useful insights for new lecturers, which may be transferred into their own assessment of students.

Introduction

Much recent literature on assessment in higher education (HE) gives strong support to the use of both peer and self-assessment (Boud 1995, Brown and Knight 1994, Gibbs 2001, Brown Bull & Pendlebury 1997, Brown and Glasner 1998, Brown & Dove 1990). For Boud, self-assessment is a transferable skill and is a principal part of the student learning experience.

“Students will be expected to practice self evaluation in every area of their lives on graduation and it is a good exercise in self-development to ensure that their abilities are extended.” (Brown and Knight 1994)

Peer assessment is the assessment of the work of others with equal status and usually has an element of mutuality. Underpinning a peer-assessment process is giving and receiving feedback from which continued reflection and perhaps dialogue may continue. Brown, Bull & Pendlebury (1997) draw a distinction between ‘peer assessment’ and ‘peer marking’, the later being the process by which someone makes an estimate of another’s work, and also of ‘peer feedback marking’ which involves students deriving criteria, developing a peer assessment form, providing anonymous feedback and assigning a global mark. In the context of this study, the term “peer-assessment” is more closely allied to Brown et al’s definition of “peer feedback marking”.

The use of peer and self-assessment carries a number of perceived advantages:

Students have more ownership of the assessment process (i.e. it is not just being “done” to them);
It can involve students in devising and understanding assessment criteria and in making judgements;
It encourages formative assessment – learning through feedback;
It encourages the reflective student (autonomous learner);
It has validity - it measures what it is supposed to measure;
It can emphasise the process not just the product;
It is expected in working situations;
It encourages intrinsic rather than extrinsic motivation;
It challenges the role of the tutor as the sole arbiter of assessment.

Toohey (1996) argues that self (and peer) assessment by students is generally trustworthy provided that the criteria have been made explicit, students have had the opportunity to practice assessment skills, the rating instruments are simple and a second marker is used to moderate the assessment. In essence, if a student understands the learning requirement, the process is managed appropriately with opportunities for giving and receiving feedback, then it is likely to be a positive and constructive process for all concerned.

The purpose of this study is to involve the students (called “participants” on this programme) in the assessment process using parallel processes of self and peer-assessment and to support them in understanding what is involved and involving them in the determination of the criteria by which their work will be assessed.

Background

The University operates a postgraduate course in learning and teaching for its teaching staff. For staff with limited experience of teaching in a higher education environment, participation on the programme may be a condition of their probation. Experienced staff also voluntarily participate in the programme for professional development purposes, as may other professionals teaching in a higher education environment. The Postgraduate Certificate, which is part of a full Masters scheme and is accredited by the Institute for Learning and Teaching (ILT), contains three modules; one centred on teaching and facilitating learning, one on assessment, the third focused on curriculum evaluation and development. One of the programme aims is for participants to experience and examine a variety teaching, learning and assessment methods.

The module entitled ‘Managing the Assessment Process’ focuses on the role that assessment plays within the overall learning process by examining assessment purposes and practices within generic and subject disciplines and evaluating them in terms of supporting and enhancing the learning experience of students. Of its two assessment components, one is in the form of a group presentation. This assessment contains a number of features:

i) The group is required to identify the assessment criteria by which the presentation will be assessed.

ii) The module has six learning outcomes. The group must identify which of the learning outcomes are applicable to their presentation.

iii) The 15-minute presentation (with 5 minutes for questions) is tutor, peer and self-assessed using the assessment criteria set by the group and against the identified learning outcomes.

iv) Assessors (tutor, peer and self) recommend an overall mark (percentage) using the grade descriptors provided in the scheme.

v) Assessors provide written feedback to the presenting group in which comments refer to the match with the assessment criteria and learning outcomes, as well as providing general remarks relating to the presentation.

vi) The final percentage is the average of all the recommended marks unless the overall average is sufficiently different to the average of the tutor marks i.e. falls into a different grade category. If this is the case, a further moderation by tutors will determine the final mark. Given that the majority of assessors will be peer participants, this variation occurs only when there is a significant difference between the tutor and peer averages.

The purposes underpinning what may seem like a fairly complex and elaborate process are specific. Accepting that “assessment is at the heart of students’ experience” (Brown and Knight, 1995) and “If you want to change student learning, then change the method of assessment.” (Brown, Bull & Pendlebury, 1997:7), creating greater student involvement and ownership of the assessment process is intended to sharpen their awareness of the key issues, particularly as participants are involved in assessment design themselves. Allied to this is the importance of learning outcomes and assessment criteria. In recent years, the use of ‘learning outcomes’ have replaced ‘objectives’ as the means of identifying the learning that needs to be demonstrated for credit to be awarded. At the same time, if appropriate judgements are to be made, then the criteria by which this occurs need to be “explicit, available, open to interrogation and shared” (Brown & Glasner 1999:11). Involving participants using the mechanisms described above seeks to provide insights in how the assessment process works (most important on a programme which is preparing them to manage assessment processes themselves) and also to provide a process that has a measure of reliability, validity and is transparent.

Organisation of the Presenting Groups

Each of the two cohorts had twenty-five participants, the majority of whom have little (<3 years) teaching experience in an HE context. Presenting groups were self-selecting (group size varied from 2 to 5, most commonly 4) and were formed around a declaration of interest in specific topic areas associated with assessment. Participants were encouraged to form groups, which crossed subject disciplines although this was not a requirement. Guidance to assist presentation preparation was provided, in particular, in devising appropriate assessment criteria and the selection of learning outcomes. Presenting groups were required to submit presentation outlines early in the module as well as producing final drafts which identified the topic, a brief description, the assessment criteria (with weightings, if deemed applicable), and the appropriate learning outcomes. The first cohort contained six groups, which presented over two sessions (one group was unable to present on the predetermined day). The second cohort had seven groups, which presented over one session.

Research Methodology

The process for collecting and analysing data in the study was undertaken in two parts:

A comparison of the marks and written feedback produced by each assessing group (tutors, peers, self). Each assessor had anonymously completed a feedback form, which outlined the project title, assessment criteria and learning outcomes for the presentation.
A follow-up questionnaire sent to all participants four months later to determine:

how the presenting group negotiated and agreed the assessment criteria;
how appropriate learning outcomes were identified;
the ease (or otherwise) of undertaking the self-assessment using the assessment criteria and learning outcomes agreed by the group;
the ease (or otherwise) of assessing other groups and using their assessment criteria and match to the identified learning outcomes;
the ease (or otherwise) of deciding on an appropriate mark for a presentation, whether their own or others;
the usefulness of the written feedback received from all assessing groups;
the impact that undertaking the process has had on their own thinking and subsequent practice;

Their current thinking on the value of:

- self-assessment

- peer assessment

- the use of assessment criteria, learning outcomes and their relationship to the assessment process.

Analysis of the Marks awarded

Tables 1 & 2 show the award of marks by each assessing group in each of the two cohorts.

Group /

Tutor marks

/ Peer
marks / Self-assessments / Averages
(all marks) / Percentage awarded (post-moderation)
1 / Average: 58%
No: 2
Min/Max: 55-60
Range: 15 / Average: 61%
No: 18
Min/Max:: 55-80
Range: 25 / Average: 65%
No: 4
Min/Max:60-70
Range: 10 / 64% / 64%
2 / Average: 49%
No: 3
Min/Max:46-50
Range: / Average: 59%
No: 3
Min/Max:58-60
Range: 2 / Average: 60%
No: 4
Min/Max:55-65
Range: 10 / 56% / 53%
3 / Average: 68%
No: 3
Min/Max:62-75
Range: / Average: 55%
No: 17
Min/Max:48-70
Range: 12 / Average: 54%
No: 4
Min/Max:47-60
Range: 13 / 57% / 62%
4 / Average: 48%
No: 2
Min/Max:47-48
Range: 1 / Average: 56%
No: 17
Min/Max:44-64
Range: 20 / Average: 56%
No: 5
Min/Max:51-64
Range: 13 / 55% / 52%
5 / Average: 61%
No: 4
Min/Max:55-66
Range: 11 / Average: 68%
No: 4
Min/Max:65-70
Range: 15 / Average: 65%
No: 3
Min/Max:64-66
Range: 2 / 64% / 64%
6 / Average: 72%
No: 2
Min/Max:72-72
Range: 0 / Average: 71%
No: 19
Min/Max:55-83
Range: 28 / With peer marks / 71% / 71%
Mean range / 7.3 / 18.7 / 9.6

Table 1:Cohort 2000-01

Group /

Tutor marks

/ Peer marks / Self-assessments / Averages
(all marks) / Percentage awarded (post moderation)
1 / Average: 55%
No: 4
Min/Max: 45-65
Range: 20 / Average: 57%
No: 19
Min/Max: 50-65
Range: 15 / Average: 75%
No: 3
Min/Max: 70-79
Range: 9 / 59% / 59%
2 / Average: 53%
No: 4
Min/Max: 42-58
Range: 16 / Average: 58%
No: 19
Min/Max: 45-66
Range: 21 / Average: 65%
No: 3
Min/Max: 65-65
Range: 0 / 57% / 57%
3 / Average: 62%
No: 4
Min/Max: 57-71
Range: 14 / Average: 59%
No: 19
Min/Max: 38-72
Range: 34 / Average: 85%
No: 4
Min/Max: 78-94
Range: 16 / 63% / 63%
4 / Average: 69%
No: 4
Min/Max: 60-81
Range: 21 / Average: 65%
No: 19
Min/Max: 58-91
Range: 33 / Average: 67%
No: 4
Min/Max: 65-70
Range: 5 / 66% / 66%
5 / Average: 72%
No: 4
Min/Max: 64-85
Range: 21 / Average: 71%
No: 19
Min/Max: 58-85
Range: 25 / Average: 70%
No: 2
Min/Max: 70-70
Range: 0 / 71% / 71%
6 / Average: 63%
No: 4
Min/Max: 60-66
Range: 6 / Average: 64%
No: 19
Min/Max: 50-87
Range: 37 / Average: 61%
No: 4
Min/Max: 57-65
Range: 8 / 64% / 64%
7 / Average: 71%
No: 4
Min/Max: 65-75
Range: 10 / Average: 71%
No: 19
Min/Max: 58-100
Range: 42 / Average: 65%
No: 3
Min/Max: 60-70
Range: 10 / 71% / 71%
Mean range / 15.4 / 29.9 / 6.9

Table 2:Cohort 2001-02

Tutor averages differ from the overall average mark more noticeably in cohort 1. In this cohort, tutors more often scored lower than peers and self-assessments (except for one noticeable exception where the average tutor mark was much higher). At that time, the results had led to a review amongst tutors as to how they had arrived at their marks and to a revision for the next cohort. In the second cohort, no average tutor marks much differed from the overall average mark so none were altered at moderation. However, tutor marks for cohort 2 display a significantly greater range of marks. This is in some part due to the addition of two new programme tutors who had not previously undertaken assessment on the programme.

Differences between tutor and peer assessment is again more evident in the first cohort with tutors marking lower than peers in all but one case (see table 3).

Table 3

Comparisons in cohort 2 show an equal number of tutor averages higher than peer averages as the other way round (see table 4). It is interesting that where the average tutor mark has the greatest difference compared with the average peer mark, it contain s a high tutor score. The reverse is also the case when the tutor score is low.

Table 4

The peer averages may well reflect a regression to the mean that can occur in a larger group size. However, closer scrutiny of the range of marks demonstrates a significantly increased range in the recommended marks awarded by peers. It is greater than the average range for either tutors or in self-assessment. Furthermore, the overall mean range increases in cohort 2 from 18.7 to 29.9 (see tables 1 & 2). Participants were not asked to justify their marks at the time as this would have been inappropriate and possible detrimental to a legitimate peer-assessment process. However, the preparation of cohort 2 included greater attention to the grade descriptors for the scheme. Although few peer marks fall below the pass mark (40%), peer-assessors seemed more comfortable in using the full marking range.

It is perhaps understandable that the range of marks from self-assessors is less. Logically, the number of assessors undertaking self-assessment in any one presentation was only a small subset of all assessors. However, questionnaire and interview responses also indicated that the process of self-assessment was less problematic than peer-assessment. Self-assessors often commented on how they had “run out of time” or not presented points clearly or explicitly enough. But because considerable time and thought had been invested in the preparation of presentations, the presenting group already had an “inner feeling” of the worth of the presentation. It was just a matter of delivering the material and “releasing the potential”. This may provide an indication why average self-assessment marks were consistently higher than average tutor marks, and often the average peer mark, particularly when the tutor or peer marks were in the lower bands (40s & 50s). The presenting group knew what they wanted to say, whereas others just received what was delivered. Apart from one group, the difference between the tutor and self-assessment in cohort 2 is significantly less. Again, additional attention to grade criteria that occurred with this cohort may be a factor.

As is demonstrated in table 3, comparison of the average peer and self-assessment marks reveals some unanimity in cohort 1. However, there are noticeable differences in cohort 2 (see table 4) where the self-assessment marks are higher. Responses in questionnaires and interviews highlighted the challenge that peer assessments had to be undertaken “at short notice” and using assessment criteria derived by the presenting group. Interpreting them within the time-pressured context of the presentation was problematic, a comment also made by tutors. Some presenting groups decided to provide a weighting for each of the assessment criteria, which proved helpful for some peer assessors. However, it also proved to be a double-edged sword when criteria given a high weighting were less fulfilled in the presentation or others weighted lower which were strongly evident and could not be given any more credit.