INSTITUTIONAL AND SUBJECT REPOSITORIES –
an update on developments
April 2010
This review for ICSTI looks not only at Repositories but also the wider context within which they operate. Institutional and Subject-based Repositories are part of a broad movement which offers access to information for all users with fewer financial restrictions attached than hitherto. As such some of the review also looks at milestones which have occurred in the emergence of the open access and open source movements generally, many of which were highlighted in the earlier ICSTI Insight on Open Access (January 2009). However, this particular ICSTI Insight has a more specific focus than the earlier Insight,
A key aspect of this review is to see what the emergence of repositories means for ICSTI membership; particularly will it create scope for organisations to benefit from the advantages which ICSTI membership offers. There is already some indication that repositories are part of the ICSTI club, with the British Library and NRC-CISTI being active in operating a subject-based repository, but is there scope for the more numerous institutional repositories to benefit from being part of ICSTI?
There are strategic as well as operational issues to consider in looking at repositories, both of which have relevance for ICSTI as a global membership organisation.
Background
For the purposes of this review, the definition for institutional repository proposed by Dr Clifford Lynch (Coalition for Networked Information) in 2003 will be adopted. That an institutional repository (IR) is “… a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members” (Lynch, 2003). It is in most cases dedicated to the long-term preservation, organisation, access and distribution of such materials.
Subject repositories are slightly different. They are not restricted to any one institution for their input, in some cases taking on a global mantle. Though not always the case, there is more of a ‘buy in’ from the relevant community, in one notable case – in high energy physics – the subject repository grew out of a voluntary effort by a few dedicated individuals who were keen to see a free exchange of e-preprint information within their discipline. But more generally, they adopt a policy of seeking to provide a comprehensive collection of freely accessible material relevant to a particular subject or research-focused discipline.
One of the key challenges facing both institutional and subject-based repositories is to secure the engagement of authors of research output as depositors of material into an established repository. It has been claimed that “build it and it will come”, but authors have in general, so far, largely proven unresponsive. As a result, mandates have emerged as the stick with which to promote deposition. The jury is still out on whether this is supported by the research community to a significant extent, though there are indications that mandates – particularly in the biomedical sector – are having some impact, and rates of deposition in relevant subject-based repositories are growing.
In the meantime, the growth of the repository movement is not without its challenges.
Current Controversies
Effect of repositories on Journal Subscriptions
ICSTI members are well aware that the ways in which scientific and scholarly knowledge is created and disseminated are undergoing radical change in light of new digital technologies, though the extent and pace of this change is not uniform across disciplines. Something akin to a paradigm shift in the overall scholarly communication system has also been underpinned by initiatives such as open access, open science and data sharing – and the repository movement worldwide.
Increasingly, funding agencies are implementing mandatory deposit of the outputs from publicly funded research, and whilst knowledge as a national public good is a compelling case for creating repositories, in practice it does not always sit well with institutional structures, disciplinary practices or incentive systems. Even where mandates do not exist, funding agencies, and increasingly higher education institutions, have set up policies based on voluntary self-archiving.
There is an intensity and in some instances ferociousness in the debate about the role of repositories in scholarly communications. There is however a lack of evidence of the potential impact that the systematic archiving of research outputs in open access repositories might have on the ecology of scientific research, nor most efficacious model for doing so. There is a suspicion that journal subscriptions are being threatened by the support given to repositories, even though the reasons for journal cutbacks may lie elsewhere. For example, the emergence of alternative media, the rise of workflow processes and/or the current economic/financial crisis could be equally significant in dictating a cutback in traditional journal subscriptions. Nevertheless, the overall claim is that the fundamental “Record” or “Minutes of Science” are being put in jeopardy and that Institutional and Subject-based repositories are good whipping boys.
There is some evidence that systematic deposit of authors’ final versions (i.e., including amendments after peer review – stage-two manuscripts) might eventually lead libraries to cancel some of their journal subscriptions (as suggested by Beckett and Inger, 2007), as librarians might choose to attach little importance to whether it is the publisher’s version that is accessed by their readers or the author’s final version. They have supposedly the same content, but critical differences may occur, as pointed out by Wates and Campbell (2007). However, it should be stressed that currently there is no evidence of librarians cancelling subscription-based journals due to the ready availability of open access materials. But it is a cauldron of controversy about which ICSTI members will need to take a view.
To look at it more positively, authors such as Pinfield, believe open access repositories can co-exist harmoniously with scholarly publishing, as “it has recently become apparent that there is a potential for repositories and journals to interact with each other on an ongoing basis and between them form a coherent OA scholarly communication system” (Pinfield, 2009, p.165). Although publishers claim that their dissemination role often eclipses other value-added functions they perform in the article lifecycle and scholarly communication more generally, the impact of electronic publishing, open access and repositories has nevertheless disrupted publishers’ monopoly over the dissemination of scholarly works.
A study is underway, partially funded by the European Commission with both publishers and research libraries participating, to see whether the business plan underlying repositories will threaten the journal subscription/licensing business. However the results from this study will not emerge for several years, and the indications are that by the time they do appear the situation will have largely been resolved as dictated by prevailing and emerging market conditions.
As a specific instance of the controversy and emotion which surrounds the repository movement it may be useful to look at one case in point – the physics community. This brings to the fore some of the concerns which exist about the repository movement in general.
The Physics Community
The main controversy which existed last Summer (2009) revolved around whether physics journal subscriptions in particular have been affected by the existence, since 1991, of the subject based repository (arXiv) which caters for physicists’ need for immediacy in information updates. The controversy centred on information contained in a report completed for JISC by Key Perspectives Ltd (Swan and Brown, 2005) which postulated that there was no relationship, and this was based on some email exchanges and personal contacts with representatives from two leading physics societies. In the absence of hard evidence, and the presence of strong personal agendas, the debate became emotional to the extent that reputations of industry experts were being seriously impugned. It exemplified the paucity of hard data in the current literature about the real impact which institutional and subject-based repositories are having on the journal subscription business. Opinions, even enlightened opinions, are no substitute for solid evidence, and such evidence of a relationship is still lacking. Emotions ran rife.
The Swan and Brown report included the view, obtained during the course of email exchanges with representatives from the Institute of Physics Publishing (IoPP) and the American Physical Society (APS), that self-archiving (of physics articles) had no impact on the sale of journal subscriptions. A representative from the Institute of Physics Publishing quoted that:
“Our authors and editors tell us that they value publishing in a peer-reviewed journal because this continues as an essential requirement for establishing reputation and authority of the research they publish. Whilst posting a pre-print or post-print is becoming more of an essential in some areas of the physics community for immediate and wide dissemination we do not see the arXiv or repositories threatening our business” (Swan, 2009).
Some from the publishing industry, notably Sally Morris, ex-CEO of the ALPSP publishers association, expressed reservations about the interpretations which have been made on this.
More recently, the debate about the role of subject-based repositories and journals has taken on a new life as a result of a report entitled “Citing and Reading Behaviours in High-Energy Physics: How a Community Stopped Worrying about Journals and Learned to Love Repositories” (Gentil-Beccot et al., 2009). This report raised a number of questions, for example, is there an advantage for scientists to make their work available through repositories, often in preliminary form? Do scientists still read journals or do they use digital repositories? According to the authors the analysis of citation data demonstrated that free and immediate online dissemination of preprints in repositories created a significant citation advantage in high energy physics (HEP) ‑ a five-fold citation advantage ‑ whereas publication in (gold, or author-paid) open access journals gave no discernible advantage. In addition, the analysis of clickstreams in the leading digital library of the field shows that HEP scientists seldom read journals, preferring preprints instead.
Again, this prompted a response in the listservs. In July 2009, in American-Scientists-Open-Access-Forum, Gene D. Sprouse, Editor in Chief, American Physical Society, likened the arXiv service to being “the newspaper” for the physics community whereas the journal provided the validation – and that both services are necessary (Sprouse, 2009).
Anne Gentil-Beccot et al. (2009) discussed the large number of citations that papers posted to arXiv and compared those in HEP that are not posted to arXiv. The authors of the paper found two “striking pieces of data”:
· Papers on arXiv are cited before they are published in journals. In fact 20 per cent of the citations that articles receive in their first two years occur during the time before publication.
· Another large set of users use arXiv directly, and thus more than 80 per cent of readers in HEP prefer arXiv versions to published versions, when given a choice.
Furthermore, the authors state that:
Together, these points make it clear that researchers in HEP don’t use journals to communicate scientific ideas. They may notice a paper is published, and they certainly value the peer-review and other functions provided by the journals, but they don’t communicate using the journals; instead they use arXiv, which is much faster.
Physics may be considered a unique discipline given its strong cultural tradition in favour of printed preprint distribution of articles even before the arrival of the Web. Nevertheless, there are similar experiences in several other disciplines as well – such as mathematics and economics – which have created a strong community support for the dissemination of Stage Two manuscripts (post-prints, but prior to publication in journals) over the years.
This suggests that Repositories are creating a new role for themselves as a preferred dissemination source for (hard science) information, possibly at the expense of traditional communication systems such as Journals. Though this is not suggesting that Repositories are replacing Journals as the formal record of science, they may offer a more topical and timely additional service. This is without taking such issues as business models into account.
If this proves to be the case, and the argument for complementarity between formal and informal publishing of research results can be made, it could be inferred that the repository movement – involving over 1,000 organisations worldwide – could become an important constituency for ICSTI. But there are still some perceptual hurdles to overcome.
Alternative approaches to providing Repositories of Scholarship
As was described in detail in the January 2009 ICSTI Insights on ‘Open Access’, there are two main "Roads" to open access. These are the Golden Road (open-access journals), and the Green Road (open-access archives).
There is a further distinction which can be made between "free access" (removal of price barriers) and "open access" (removal of both price and permission barriers). Those who favour the Golden Road to open access (via open access journals) prefer a minimum of permission barriers, so that text and data-mining of a complete collection of the openly-accessible research literature can be undertaken. Those who favour the Green Road to open access (via open access archives) are willing to tolerate more permission barriers, in order to increase the likelihood that publishers of toll-access journals will permit authors to self-archive copies of published articles in openly-accessible archives.
Thus, these two complementary "High Road" strategies for fostering open access may lead to different perspectives about permission barriers, while sharing the same agreement on the need to remove price barriers.
There is also a "Low Road" (or Grey Road) strategy which involves the researcher self-archiving on the surface web on the author’s own website. This permits free access, but lacks a coherent infrastructure for identification of the openly-accessible research literature of the kind that both of the "High Roads" to open access are designed to provide. Nevertheless, there is evidence that many/most of the more eminent researchers are creating their own websites and blogs which they use to communicate their latest ideas, views and research results to the community. Anecdotal reports from recent contact with many UK-based researchers indicates that the informality of depositing onto one’s web site, and accessing other colleague’s web sites, is a substantial activity and one which is largely being ignored in debates about the future of scholarly communication.
But as far as the organised repositories are concerned, there is therefore lack of agreement on the strategic implementation of their aims between the exponents of the Green, Gold roads, and again hostility seems to break out occasionally between the main advocates of each on the listservs. This is particularly the case between Professor Stephen Harnad (advocate for the Green Road) and Professor Jean-Claude Guedon (Gold Road). The invective is sometimes extreme, and actually does little justice to the movements.