Joint Funding Councils’ Review of Research Assessment
Response from the University of Wales, Aberystwyth
Group 1: Expert Review
Expert review is the system that has operated since the RAE was first introduced in the mid 1980s. Over the five successive RAEs the system has been refined and developed, and while we would not claim that further improvements are unnecessary, we consider that the present system has proved a highly effective means of distributing research resources and promoting the development of a wide research culture in British HEIs, and we believe it to be the most effective means of achieving these goals to be available.
In so far as the assessment is of outputs, it will be retrospective. In 2001, our understanding is that the prospective element, i.e. the statements in RA5, was generally used to moderate the assessments based on RA2, but there is an issue of consistency between panels in respect of how rigorously this was applied.
A considerable source of confusion during the RAE was caused by uncertainty over the basic level of assessment. Assessment variously took place at the level of the individual publication, the individual researcher, and the research group. We can see that there will be different approaches according to the nature of the subject area, i.e. assessment by research groups makes a lot of sense in the sciences, but is of limited relevance to the Arts and Humanities. However, our understanding is that subjects in closely related areas diverged considerably in how they approached the assessment process. Furthermore, there was little or no transparency in how the panels would assess the submissions. In addition, when we asked for clarification on how the percentages (e.g. not more than 10%) would be applied with regard to the different base levels, we were told only that the panels would use their professional judgements. This level of transparency cannot be considered satisfactory.
We consider that the current system of assessment by subjects is the most effective way of assessing research. To assess by institution would discourage the development of research in institutions with limited traditions of research where quality exists only in isolated pockets. In these instances, the pockets of quality would be subsumed within the overall weaker profile. Assessment by individual would be extremely time-consuming and would probably run the risk of litigation. Why should an individual bear the burden of having been rated sub-national in public? Even if assessment by institution were adopted, this approach would still require assessment by subjects within this if it were to have any substance.
There were significant disparities in the ratios of panel members to staff submitted in the 2001 RAE. For instance, the total number of researchers in Russian was 77.3 FTE, i.e. smaller than 14 of the 31 submissions to the Hospital-based Clinical Subjects main panel, but the former had a panel of 6, the latter a panel of 16. This has clear implications for how research is assessed, and it is not clear how there can really be comparability when the level of assessment is so different.
There is a clear need for rationalisation in the number of panels, and obvious candidates include Mathematics (22-24), Engineering (26-30), Social Policy, Social Work and Sociology (40-42), Economics, Business and Management, and Accounting (38, 43, 44), Modern Foreign Languages (51-55), Art and Art History (61 and 64). Such mergers would clearly help smaller HEIs, whose resources are at the moment thinly distributed across subjects, but which would then be able to make larger submissions without raising questions about difficulties of fit. However, the need for relatively small panels would still remain, in very specialist areas such as Music or Theology, and at the same time consideration must be given to the development of new subject panels where subjects can no longer be subsumed under the remit of other panels, e.g. Creative Writing or Women’s Studies.
We do not favour a combined assessment of teaching and research.
The main strengths of the current system are: it has stood the test of time; it is understood by the sector; its findings are generally accepted; it is relatively cost-effective. The main weaknesses are: the continuing lack of transparency regarding the rules of assessment; the wide divergences in the means of assessment; the uncertainty ahead of the Exercise as to the funding implications of given grades.
Of the four main assessment models that it nominates, there are subject areas where only expert review is appropriate. This is particularly the case with the performing arts. The peer review process is expensive, but it has the virtues of thoroughness, flexibility and fairness.
Our experience suggests that in a number of cases the subject panels in the 2001 RAE lacked expertise in particular areas and that accordingly research in these areas may have suffered. Whatever the size and number of the panels, it is essential that they should ensure that they are able to judge the quality of all research equally. In this respect, we are also concerned that some of the subject panels failed to make adequate provision for the assessment of outputs through the medium of Welsh.
Group 2: Algorithm
There are a number of algorithms that can be used, particularly research grant income, bibliometric data, and research students.
The central issue is what use should be made of such data, i.e. should the data be used to moderate the central data (published outputs) – to reach decisions on borderline submissions – or should the data contribute more directly to the final decision on the grade. It is our understanding that a limited number of panels, mainly in engineering, used a formula that followed the latter method, but for the great majority it appears that such data had only a moderating role, and some panels appear to have made little or no use of it at all.
The diversity of the research base means that the likelihood of reaching ‘a one-size fits all’ approach is very problematic. An algorithmic model using bibliometrics works reasonably well for laboratory- and hospital-based clinical subjects but far less well for arts and humanities subjects, and worst of all for any subject which contains an element of practice as research. For instance, some subjects, such as Economics, have a fairly clear hierarchy of periodicals; others, particularly in the Arts and Humanities, have consciously disregarded the source of the output in favour of the content alone.
The danger of the algorithmic approach is that the measures themselves can become the focus of activity rather than the research that they are being used to evaluate. Furthermore, researchers start quickly to play the game, e.g. researchers start to cite their colleagues for the sake of a citation hit.
Group 3: Self-assessment
Self-assessment already takes place in that departments and institutions have to take critical decisions on the quality of the research of their staff, and to provide an overview of their performance in RA5. More self-assessment will only add to the complexity of the exercise, and whatever the measure of self-assessment, it has ultimately to be subject to external scrutiny.
Group 4: Historical ratings
In so far as the RAE acts as a stimulus to research, a reduction in the level of funding resulting from the outcomes would make the RAE less competitive and reduce the stimulus accordingly. A central message that has come across in the RAE has been that tradition and reputation should count for nothing, but this approach would appear to enshrine tradition and previous performance. It would also add considerably to the complexity of the Exercise. It is also not obvious why a group of researchers who have achieved international excellence should receive less or more than a similar group in another HEI because of the differential performance of their predecessors. Despite the assurances that "the councils recognise that such an approach could only be used in conjunction with another system", the central thrust of this model is to support the status quo, to give to those who already have, and to make it ever harder for institutions which are currently smaller, newer or marginalized, to make headway in terms of developing a research culture. Its main effect would be to safeguard the income of the already privileged.
Group 5: Crosscutting themes
a. What should/could an assessment of the research base be used for?
The primary purpose of research assessment should be the distribution of funding for research to HEIs. The production of management information should be a secondary consideration. Some management data on research is already available through HESA, e.g. research income and research students. It would be of help if statistical data in relation to the RAE were made more easily available, e.g. allowing us to compare research student numbers across departments. But there is no obvious need for substantial additional information in other fields.
The secondary purpose of research assessment is to promote research. By establishing de facto benchmarks for the production of outputs, the RAE has considerably enhanced the production of published outputs by academic staff, in addition to having a considerable impact on PGR recruitment and grant income generation. For as long as the resource continues to be distributed on a highly competitive basis, the incentive to the production of quality research will continue.
We hope that the Research Councils will continue to assess funding proposals on their own merits, and not make them contingent on the achievement of certain grades. There are already a number of restrictions, e.g. on studentships, associated with grades, and we would be reluctant to see these extended. However, the growth in the number of applications to the Research Councils may make it necessary for them to introduce further thresholds in order to limit this further. The AHRB has indicated that it sees its mission as distinct from the RAE, and it would be likely to oppose any linkage.
b. How often should research be assessed?
Research should continue to be assessed through one exercise taking place every few years. While a rolling approach that considers individual subjects or groups of subjects might be possible, the experience of the original cycle of TQAs suggests that (a) delays would be very likely to occur; (b) the partial data arising would be misused by the media to elevate the performance of certain subjects; (c) the administrative requirements placed on institutions would increase yet further; (d) gamesmanship would be encouraged as staff whose expertise lay between subject areas (e.g. political historians), transferred across institutions in time for the next round.
The desirable period of assessment for science is probably shorter than for the Arts and Humanities, but the attractiveness of establishing appropriate assessment periods for each subject is outweighed by the need to maintain one assessment cycle for institutions. Besides, even if a given cycle does not ideally fit a given subject area, all research within that given area is still being subject to the one assessment process and all staff are therefore treated equally.
Since much work in the Social Sciences utilises the methods of the Arts and Humanities, the publication period for these UoAs should be extended to be the same as the Arts and Humanities.
c. What is excellence in research?
It is clear that some panels in the 2001 RAE addressed this issue, but others did not. We refer to the criteria for the Law panel (3.28.7), whose definition was clear and succinct:
“The Panel will assess international excellence on the basis that work can be regarded as of international excellence if it is a primary reference point in its field (in the sense that it is, or in the opinion of the Panel is likely to be, recognised as amongst the best in its field), and of national excellence if it is a reference point in its field (in the sense that it contributes to knowledge and understanding).”
The centrality attached to the advancement of knowledge in this definition is of crucial importance, and it is this that must be privileged over value added, commercial relevance, and practical applicability.
Panels must be able to define these terms if they are to apply the criteria. This is an issue that must be addressed in time for the next RAE.
There is some fear that the working definition of ‘international’, with limited exceptions, privileges the English-speaking world over the rest and that work that is shown to have a significant impact in the former will have more weight than work in the latter, although this is of course difficult to quantify.
In this respect, we are concerned about the lack of clarity with regard to the proportions of national and international staff that were required in order to achieve given grades. For a grade 5, between 10% and 50% had to fall within the international category: this was a very wide sweep and we suggest the review may wish to consider the introduction of a wider range of grades, together with narrower bandings for the proportions required to achieve those grades.
d. Should research assessment determine the proportion of the available funding directed towards each subject?
This is one of the most critical questions that the review must consider. The current pattern of distribution of funding by subjects closely reflects decisions taken nearly 20 years ago on the size of the subject pots. This has led to significant disparities between subject areas. For instance, a researcher in Agriculture generates 36% more funding than a researcher in Biological Sciences for the same RAE grade.
A number of algorithms could be used to determine a redistribution of funding, e.g. average staff costs, RAE performance, and success in generating external income, although these in themselves reflect the success of the subjects in generating QR income, and the overall profile of the subject in UK universities, e.g. the average grade in subjects such as Russian and Classics is very high since these subjects have no presence in post-1992 HEIs.
The main difficulty is establishing comparability across subject areas. We see no obvious way in which, say, performance in Russian can be compared with, say, Psychology.
There is also some danger of considerable disruption, and it may be that the best approach will be to seek to achieve a broader consistency and comparability across cognate subjects, e.g. medical sciences, performing arts, or foreign languages, which would be assisted by a reduction in the number of subject panels and the creation of much broader subject areas with one common unit of funding.
A related issue is the extent to which the subject pots should be adjusted to take account of the average grades in a given subject. The problem here would be that grade inflation would be encouraged, and additional money for a given subject could only come at the expense of other subjects, since the overall pot is inevitable finite.
e. Should each institution be assessed in the same way?
The same assessment regime should be used for all HEIs for the next RAE. However, if QR is to be focused on the highest performing departments and institutions, there will still need to be developmental funding available to promote research elsewhere. A research base limited to a certain number HEIs will undermine creativity and deter innovation and diversity.
f. Should each subject or group of cognate subjects be assessed in the same way?
Each subject is already not assessed in the same way. There were significant differences in the 2001 RAE, even in closely related subject areas, in how research was assessed, including the base unit of assessment (output, individual or group), the weight given to the data in RA3-6, and the extent to which the panels read the outputs (ranging from c. 10% through to virtually everything). The two main challenges that this review should face are:
(a) the need for consistency between cognate disciplines: Russian must be assessed in the same way as Italian.
(b) the need for greater transparency in how the panels will operate.
A reduction in the number of subject panels would go some way to address these problems.
The current framework already allows for a measure of discretion and differences between subjects. The question is to what extent the current rules and guidelines were actually applied.
g. How much discretion should institutions have in putting together their submissions?
We believe that the current level of discretion is about right. Greater discretion would make it more difficult for research to be assessed and for the results across the sector to be compared. The evaluation process is already long and time-consuming. The critical issue here is whether subject panels should be allowed greater discretion in determining the appropriate criteria for their field.
The dichotomy between institutions and individuals is not a real one. In practice, for this institution at any rate, the submissions were very much constructs that arose out of constructive and extended dialogue between the senior officers of the institution and members of staff in each department. These discussions included decisions on balancing volume against quality.