Perspectives on Assessment
Tony Gardner-Medwin, UCL
Summary
These are personal perspectives on a number of assessment issues, as I see them at UCL in 2006. Assessment must not be something divorced from teaching and learning. Firstly, summative assessment obviously determines very much the ways in which students study and revise, especially given the traditional UK HE scenario where a degree class is seen as the main 'deliverable' of a university education, to use current jargon. But more profoundly, the lasting objective of HE ought to be - in my view and in that of an increasing number of teachers - an ability to be critical and to self-assess one's own work in a chosen field -- at least as much as the acquisition of specific knowledge and skills. This requires that assessment, reflection, peer interaction, self-criticism and judgement of the limits of one's knowledge must be part of the student learning experience.
1. Learning oriented assessment
LOA (Learning Oriented Assessment) is new jargon to some extent supplanting the word "formative" for assessment that helps student study and learn, as opposed to "summative" assessment that doesn't . This helps to divorce it from just preparation for exams (summative assessment). LOA indeed is a concept applying to exams also, since the way in which exams operate affects the way in which students learn, and should of course be planned with this in mind. I like the term. It stresses the principal role of assessment as something to stimulate learning. Assessment that simply ranks or classifies students is increasingly regarded as a marginal (possibly unnecessary and even stultifying) role of universities. Most assessment effort should aim to directly stimulate effective learning and enhance ability to judge one's own work.
2. Marking criteria, defined outcomes and alignment with course objectives
Quality audits tend to stress these. I regard 'marking schemes' for written work (in the form of lists of points that should be included) as very hazardous, and the notion that they allow non-experts to mark fairly as nonsense. A more sensible strategy may be to make different assessment items target distinct circumscribed skills: e.g. "This Q will be marked on how well you apply principles to an unfamiliar problem"; "In this answer you must be concise and logical, and you must be careful not to introduce irrelevant information."; "This Q asks for more than core textbook knowledge, and you may include speculative ideas but you must ensure that you identify and distinguish points that you understand and are sure of, from those where you are unsure. ".
3. Degree of choice in examinations
This tends to be an uncontrolled facet of exams and in my experience is often so great as to encourage students to take the risk of ignoring large parts of a course. Of course exams often want to test deep knowledge, which may be realistically achievable for only a fraction of the course material. It is fine for a course explicitly to encourage optional specialisation if appropriate, but this should be clear in the course objectives, not just in the structure of the exam. Where breadth of knowledge is also important, this can often be tested economically alongside the specialised Qs with a form of 'cheap' assessment (e.g. compulsory True/False or MCQ Qs) that will detect undue selectivity in student learning. Assessing breadth by marking a large number of essays is extravagant and not sensible.
4. Problems in marking large classes
I personally can't mark more than about 30 essays on the same topic in a day. Others may do better! But this shows the problem of handling classes of 300+. The only solution is to reduce the amount of regurgitative written work expected. (I know we don't ask them to regurgitate but they do, and this will generally achieve a pass mark, if not a first). One way to cope is to use computer-based or Optical Mark Reader assessment (below); another is simply less assessment by staff, perhaps combined with peer assessment which though less reliable may provide a better learning experience for the students. Assessment based on more personal, taxing and constructive Qs may be another, less likely to produce the mind-numbing regurgitation that makes exam marking so difficult. E.g. perhaps "Describe something about membrane transport that you had trouble understanding fully at first, why this was, what you now understand, and how you got there." I think I could mark 100 of those in a day!
5. Plagiarism and rote learning
In law I guess, plagiarism is a tort and rote learning simply a strategy; but they are closely related. They are both communication without evidence of understanding. One strategy is to ask Qs where the student must use understanding, and thus application or transformation of learned or sourced material in literature or on the web. Plagiarism is best prevented by clear criteria of what is expected. For example, I always insist with project students that each idea or conclusion expressed in a report must either have a clear source where I can read more about it, or be clearly identified as the student's own idea in whole or in part. Students are often strangely reluctant to tag their own ideas, though of course these are often what we are looking for in assessing quality work. Plagiarism software is available to compare submitted text with databases of likely sources, but detection is only half the problem for staff, and leads to time-consuming investigation. Prevention is much preferable and requires that students
(a) understand clearly what is and is not OK,
(b) see detection as possible or probable (simply asking for an e-copy of an essay that might be submitted to plagiarism detection software may help in this, even if it is seldom used),
(c) see the consequence of breaking the rules as serious.
Plagiarism and regurgitation are seldom a problem with Oxbridge style tutorials, where the student will be embarrassed and humiliated if unable to discuss something s/he claims to have written. A less time-consuming strategy may be a short personal viva on essay and project work. In my view, and for this reason, a viva should be included in the assessment of all project work - not as a major quantitative part of the assessment but to motivate an honest approach by the student and to help reveal qualitative problems that may exist.
6. Efficacy of feedback from assessments
Assessment that doesn't give effective (acted upon) feedback that improves learning or performance should be seen as largely a waste of time (1 above). Feedback like "you fail" or "you got 68%, nearly a first" may stimulate learning slightly, but are not worth the staff time it takes to generate them. Students often learn more from giving and getting peer assessment than from teachers' comments. Software systems exist to help teachers give clearer comments about work from distinct perspectives. The aim of feedback should always be to help the student improve in future, rather than to mark or rank them.
7. Use of objective assessment (MCQ, T/F, EMQ, computer marked)
This can relieve teachers of some work. It should not be seen as a substitute for teacher involvement. It allows teachers to focus on tasks that require teachers, while still ensuring breadth of learning and providing self-assessment and stimulation of effective study. Computerised assessments cannot replace small group tutorials, but should allow a smaller number of tutorials to be more interesting and valuable once the task of teaching, practising and testing basic skills is removed. In exams likewise, they can help focus and reduce hand marked content and ensure coverage.
8. Certainty-based marking (CBM) to enhance learning and assessment
Although CBM in various guises has been researched a great deal and found to be both constructive pedagogically and to benefit assessments statistically with practised students, its use at UCL has been almost certainly the largest scale, the simplest, and the most successful implementation. This has been a HEFCE funded initiative that I have led, and it has received much interest nationally and internationally in pedagogy & learning technology circles. I rather despair however that its benefits (as a study tool and a fairer and more reliable means of objective exam marking) have been little adopted at UCL outside Phases 1,2 of the medical course. This is not because anyone explicitly challenges the benefits - rather I suspect sheer inertia. The CBM principle (refer to the website www.ucl.ac.uk/lapt ) is popular with students, transparent in operation, easy to implement with links within WebCT or Moodle, and proven as a more reliable and more valid measure of knowledge (with medical T/F exams) than simple right/wrong marking. If you use it in summative assessment you must ensure that students are well practised, but since one of its primary purposes is to stimulate more critical learning and thinking habits this is obviously sensible.
CBM rewards a student who can distinguish reliable from unreliable knowledge. This student gets more marks than one who gives the same answers, but cannot identify which are sound and which uncertain. It motivates and rewards reflection on justifications for an answer, to the point that either high certainty is merited, or reasons emerge for reservation and low confidence: both ways the student will gain. It weights confident answers more than uncertain ones, and penalises confident errors. It is important to realise that none of this is about personality traits (self-confidence or diffidence). Evidence shows that there are no gender or ethnic biases in data from our (well practised) students. CBM addresses a serious problem that arises for many highly selected UCL students: they are smart enough that they have sailed through exams with little need to access more than superficial ideas or associations to get good marks. We need to provide the incentive to think and study more deeply. A university that fails to adopt CBM is frankly, in my view, failing its students. If you disagree with this view, I would welcome your challenges.
9. Optical Mark Reader Technology for computer- marked questions
Both the UCL Records Office and the Medical School office have Optical Mark Reader (OMR) machines and cards, for running exams or formative tests. OMR cards are available with or without Certainty-Based Marking (CBM: 8 above), and for True/False, Best of 5, or EMQ (Best of 12 or so) question formats. Customised OMR cards can also be ordered through this office or from Speedwell Computing Services (www.speedwell.co.uk) You need to number your question paper with question text and graphics in the same way as the cards you will use. OMR technology has recently become very reliable and the service is much better and cheaper than the old Senate House system. Elsa Taddesse ( ) in the Records Office currently runs a very good service. In addition to the Speedwell software, there is very versatile software at www.ucl.ac.uk/lapt/speedwell/analyse.zip that can enhance analysis with or without CBM. Relevant tips:.
· Pencils must be used, with care taken to erase corrections thoroughly.
· Barcode labels are best way to ensure correct student candidate numbers. Otherwise much time can be wasted through incorrect entry of numbers on a grid, which seems too hard for some students!
· Some cards include a "Don't know" option. Advise students never to use this, since even slightly informed guesses are usually better than chance - unless you warn them that you are using a very strong negative marking scheme that penalises guesses on average to be worse than blanks. If you want to reduce the variance due to guessing, use CBM.
· View the format of OMR cards for Certainty-Based Marking (CBM) at: http://www.ucl.ac.uk/lapt/author/Speedwell_S2065_A.pdf (side A, 180 T/F Qs) http://www.ucl.ac.uk/lapt/author/Speedwell_S2065_B.pdf (side B, 135 best of 5 MCQs) http://www.ucl.ac.uk/lapt/author/Speedwell_S2487.pdf (combined Q types)
10. Re-use of questions available for practice
This can be a problem both in written exams and in objective testing. In written exams (using identical or similar Qs to past papers - especially combined with excessive choice (3 above)) re-use encourages selective learning and writing without understanding (5 above). In objective tests, as with recent True/False exams in Years 1,2 of the medical course, it has become a serious problem - the better medical students have complained vigorously that through sound study they are unable to do as well as others in the class who have simply memorised banks of questions from past papers. Because of staff reluctance to generate new Qs, re-use gradually rose to over 50% in 2004. Analysis of one exam from 2004 revealed an average percentage correct mark that was 5% above a nominal pass mark of 75% on re-used Qs and 2% below this on new Qs. This is both clearly an important issue for standard setting as well as having a deleterious effect on students' learning strategy, encouraging rote-learning and seriously undermining both care in question design and the use of certainty-based marking.
An issue here is the publication of past exam papers (including MCQ or TF Qs). UCL has had a tradition of doing this. Some institutions try to keep databases secret for re-use, but this opens them to abuse by student cliques, e.g. students reconstructing papers by distributed memorisation ('you write down Q37 after the exam') and maintaining a black market in past papers. Many past medical MCQ papers are available on LAPT with CBM (8, above) though currently feedback about individual answers is withheld online to reduce the student temptation to memorise answers and gamble on re-use (for discussion, see http://www.ucl.ac.uk/lapt/web/comment.php?feedback ). Partial re-use may be good practice to help check on drift of standards over the years and to calibrate the quality of Qs, but it is important where there may be access to previously used Qs to monitor and research the impact of this on exam performance. The risks of re-use should be balanced against the benefits of such analysis, when employed.