- 1 -

Life and Environmental Sciences Division, University of Oxford

General observations

(a) endorsement of the reasons for reviewing the form of assessment;

(b) the combination of philosophical and practical considerations which inform the consultation document is welcomed;

(c) the RAE and its funding consequences have created an ethos where too much attention is focused on research, and the status of teaching is devalued; any future review should be more broadly based;

(d) continuing belief in the importance of peer review in the process of assessment and concern that no algorithm based solely on quantitative metrics would substitute for expert judgement of the value of published research. It is accepted that this is time-consuming and could be better organised (e.g. to prevent the disruptive effects of removing cited publications from public use), but expert judgment is a key element of the assessment process;

(e) that a quinquennial exercise is as frequent as can be supported without the exercise consuming disproportionate amounts of time and money and without too much research effort being tailored towards its needs;

(f) that institutions should retain considerable discretion in composing their submissions, so as to reflect their distinctive qualities;

(g) that the number of grades is too few, leading to very large shifts in funding as a result of marginal changes (for example between 5 and 5*);

(h) there is too much inconsistency between panels and between exercises, for example with respect to:

(i) the percentage of 5 and 5* departments (a three-fold difference); what appears to be a judgment of absolute strength, on an international or national standard, is in fact a judgment of relative strength;

(ii) the extent of grade inflation from exercise to exercise; it is hard to believe that relative improvement is really so variable from subject to subject;

(iii) the weight given to certain metrics such as research income and research assistant numbers;

(i) the standard of international benchmarking needs to be refined.

Group 1: Expert Review

(j) Assessments should continue to be retrospective and by experts, the fundamental objective should be to consider the quality of published output;

(k) concern about the arbitrariness of dividing research activity by UoA; some method of indicating the presence and quality of complementary disciplines in the same institution would give some idea of “disciplinary ecology” as well as the achievements of individual entities (for example the Oxford panel-member for Archaeology – an Egyptologist – was actually assessed in a different unit, Oriental Studies; the fact that archaeologists exist outside UoA 58 is worth recognising, especially if quantitative measures of “critical mass” within an institution come to be used);

(l) the exercise undervalues subjects which are by their nature interdisciplinary, activities which are at the margins of individual units, and research which is collaborative between institutions, by its focus on ‘agenda setters’ within RAE units on an institution by institution basis;

(m) some panel members serve for too long and may in consequence wield influence on a discipline;

(n) the contribution made by younger staff is undervalued by the emphasis on established international reputation; this will contribute an ageing effect to an already ageing academic population.

Group 2: Algorithm

(o) If a metric algorithm is employed, there will be an inevitable drift to precisely calculable indices. (It is not clear how one might measure reputation based on “surveys”, Bibliometry, student numbers, and external research income, are more easily quantifiable). This will result in an impoverishment of discrimination. The introduction of numerical targets (for easily quantified variables) has already reduced the scope for clinical judgement in the NHS. The mistake should not be repeated. It would encourage number-chasing at the expense of genuine value, and originality would be the casualty;

(p) the metrics used should be more transparent and should give due weight to long-term projects and major works of scholarship;

(q) these metrics should include measures of reputation based on surveys (but note the concern under (o)), external research income, bibliographic measures (refined to cover the total corpus of material produced by a department), research student numbers, and numbers of postdoctoral research assistants.

Group 3: Self Assessment

(r) While self assessment may be morally uplifting for individuals, it is less so for groups; there is no substitute for an outside view.

Group 4: Historical Ratings

(s) The usefulness of a historical rating system depends on the objective in producing the ratings; infrastructural advantages are probably good predictors of present performance, but inherently unfair to those without them (the “added value” debate in schools ratings). They thus tend to reinforce current inequalities. It is a matter of policy whether the objective is to build on strength or to produce new candidates for future greatness.

Group 5: Cross-cutting Themes

(t) What should an assessment of the research base be used for? Not to construct ‘league tables’: such tables are constantly in danger of being misused, if taken out of context;

(u) Frequency of assessment? As rarely as possible: digging up plants to examine their roots does not promote growth;

(v) What is excellence? Probably only something which is recognised long after a work is published; and almost certainly not something done specifically with a “research assessment exercise” in mind, since in many areas this simply produces a spate of over-inflated and under-prepared publications. But peer-review comes as close as possible to identifying the current value of work done. (Creativity and applicability are two criteria for recognising good work; independent research dimensions need to be assessed separately).

(w) Should assessment determine the proportion of the available funding directed to each subject? Probably not. Comparisons between panels are invidious and will embed existing inconsistency. Quality against an international standard and strategic judgment seem more calculated to promote excellence than metrics based on external funding, which simply reinforces the already successful;

(x) Should all institutions be assessed in the same way? There is no point in comparing institutions which are self-evidently different. Many of the present problems stem from attempting to do so following the abolition of the binary line;

(y) Should each subject be assessed in the same way? Practitioners are best placed to judge the most sensitive means of assessment: subject communities should have as much autonomy as possible;

(z) How much discretion should institutions have in assembling submissions? They should retain considerable discretion. Individual institutional control allows provision for difference; the one size fits all approach has obvious disadvantages in doing this;

(z1)How can the exercise best support quality of treatment for all groups of staff? It should recognise that these quinquennial exercises most visibly disadvantage the original thinker or the project which matures over a longer term than that of the frequency of the exercise;

(z2)What are the most important features of an assessment process? That it should be simple, flexible, not burdensome; and not simply number driven.

Additional issues

These were identified in the division’s analysis of the last RAE, some of which have a bearing on the design of a possible successor scheme, including

(aa) the apparent desire of the assessors to see even non laboratory subjects (e.g. Anthropology, Geography) based on strategic research groups and not simply collaboration;

(bb) the need for clarity and consistency over which staff are to be submitted and which excluded; this would avoid much ‘jockeying’ for position through manipulation of returns;

(cc) the weight given to major written outputs – such as monographs – in comparison to research papers.