Institutional Repositories: Preparing for the Future

Keynote presented at the International Conference on Developing Digital Institutional Repositories: Experiences and Challenges, Dec. 9, 2004, Hong Kong

Kimberly Douglas

University Librarian, California Institute of Technology

December 2004

Abstract:

Much has happened in the last year to advance the cause of open access to scholarly papers. As significant as this is, change can come in unexpected ways. Through Institutional Repositories, the academic library can position itself for change that is desired and also for future unknown developments.

This has been a very significant year for Institutional Repositories. Since Samson Soong and I discussed this conference last year, open access, a movement in which institutional repositories play a critical role, has become a major political topic world-wide and has garnered a great deal of publicity. It is not a small thing that in January Mark Walport, Director of the Wellcome Trust, told the story of how he had tried to access a report in Journal of Infectious Disease on malaria research in Gambia, work funded by the Wellcome, but was unable to access the paper as he navigated to the journal issue over the web from his office and instead got the message "access denied"[1]. Apparently the Wellcome Trust didn’t subscribe to this journal. By October, within ten months of that story-telling, the Wellcome Trust had issued a research report and a supportive policy position on open access to scholarly, particularly scientific information. Since then other funding or public agency groups, the U.S. National Institutes of Health, and the U.K.’s Science and Technology Committee, as well as Danish, German and other continental European agencies as well as the National Scholarly Communications Forum in Australia, have all, in one way or another, announced that the results of publicly funded research must be made openly available to the public. There must be open access.

Discussions of what specifically open access is or how it will be attained range from categorizing of scholarly journal author permissions as in Stevan Harnad’s color coding of ‘green’ and ‘gold’ policies to a general cultural outlook shaped by the sharing culture of the Internet and research. The changes over the last ten years since Harnad published his Subversive Proposal[2] for universal self-archiving of all scholarly manuscripts, certainly underscore the tenacious power of that sharing culture. By Internet standards, a decade is an eternity, but from a human culture and societal perspective, this evolution is progressing at lightening speed.

The journal article or the peer-reviewed paper, the epitome of scholarly publishing, is the focus of most discussions about open access, and there have been plenty of these, most notably in two major journals, American Scientist and Nature, but also in a dedicated newsletter sponsored by SPARC <http://www.earlham.edu/~peters/fos/>, and on numerous listservs. All this effort to implement a general policy of open access to scholarly works is, of course, driven by the ever rising subscription and licensing costs of scholarly journals, which the academic sector is finding increasingly burdensome, unaffordable and even, at certain pricing levels, arguably unethical.

Those who fund research are accountable either directly to the public, whose taxes fund most government-funded research, or to members of a philanthropic board, whose general intent is to benefit the public. A number of these government and philanthropic bodies, as well as the universities and authors they fund, have made statements in support of open access: The Australian Group of Eight top research universities, Nobel laureates and 32 Italian university rectors for example, all signed on to the Berlin Declaration, the last of the three big symposia on open access.

Many non-profit publishers have also become pro-active: 48 non-profit publishers issued the Washington D.C. Principles for Free Access to Science; the Association of Learned and Professional Society Publishers released the their Principles of Scholarship-Friendly Journal Publishing Practice. A full 92% of scholarly publishers, according to Stevan Harnad, have signed on to support the “green” author self-archiving practice. Even the publisher Elsevier has had to issue a policy for authors that includes some self-archiving provisions. One detects that a critical mass consensus is building and that the sharing culture of the Web and of research will be concretely acknowledged and that for the foreseeable future Peter Suber will be kept very busy updating the Open Access Newsletter as the movement continues to expand and evolve.

Meanwhile, academic librarians, taking a more high-level, birds’ eye view of the situation, have consistently used the phrase scholarly communication[3] to describe their interest in the publishing pricing controversy and this has led them to adopt an agenda that would make all locally produced intellectual output of the university freely available. Thus was born the Institutional Repository effort in libraries. Librarians and other content curators in the scholarly environment know that content comes in many formats and genres. Scholarly peer-reviewed papers are only one format, albeit an important one.

If you are here with the expectation of finding a solution to the library’s serials budget crisis, my hope is that you will come to view INSTITUTIONAL REPOSITORY’s in a much broader perspective through your participation today and tomorrow. Change is here and it’s not just the change we would envision now to solve our immediate problems.

The barriers created by the high cost of scholarly journals and those initial tight copyright controls clash with the Web value of sharing and deployment of computational solutions. Open source computer code led the way to open source applications and open access to content has now become a force to be reckoned with in finally realizing some of that earlier promise. The disruptive nature of the Internet and the Web cannot be underestimated. Despite all the publishers’ protestations about copyright just a few years ago, even they have found that they must adapt to what the authors want and will do.

In his paper last year, Clifford Lynch[4] identified the fall of 2002 as the date when institutional repositories emerged as strategy for addressing changes in scholarly communication. Online storage costs had dropped making digital repositories affordable; standards such as the OAI-PMH were agreed upon. Harnad’s E-prints software developed in 1998 was a proof of concept for an institutional repository. As a result, a critical mass of individuals and groups with expertise, resources and commitment involved in digital preservation issues has emerged and we can now be reasonably confident that at least the technical problems will be solved and their solutions broadly implemented.

Currently, our communities are not unlike the blind community discovering the elephant. Each member or participant will describe an institutional repository according to his or her own experience and there might be as many descriptions as there are creators, authors, and designers. For those working on the institutional repository, perhaps like the owner of the elephant, it is important to have a clear overall concept of the institutional repository’s position in the organization. Lynch, says “a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.” He goes on to say that an institutional repository is comprised of librarians, information technologists, archivists, records managers, and faculty and administrators, and so represents a gathering of competencies and responsibilities that can collaborate in the stewardship of the university’s output, whatever form that takes. Above all, an institutional repository represents a commitment to the oversight of digital materials that requires the management of technological changes over time. Benign neglect will not do and so far there are no turnkey solutions to the institutional repository platform though such developments may be forthcoming[5].

Setting the controversies of journal cost and pricing models aside, the open access discussions are framed primarily around how to transfer the print medium to the digital and add the tracking and linking functionality so obviously possible within the Web. Yet, it’s not so straightforward. There can be many versions. An article can have any combination of three variations: editing versions, formatting versions and functionality versions. And if within each type of variation we keep to only two possibilities, draft and complete, we potentially have eight combination possibilities! Imagine all these versions of every paper being available on the web.

A 2003 study in the United States found that 44 % of web users have contributed material to the Internet, which translates to roughly 53 million individuals. Among the groups of users the report identifies are the power content creators, with an average age of 25. This age group is most likely to use new web technologies and to integrate them quickly into their lives. Another study of youth found that roughly 10 percent (or 2 million) American children and no doubt equivalent numbers in other countries who have Internet access at home have their own personal Web sites — a threefold increase between 2000 and 2003. The number of children in the U.S. with personal sites is expected to rise to more than 6 million by 2005. These sites will mostly be of interest as primary material for social scientists. Yet with this experience, the current generation comes in to the university already generating and disseminating content to the world with no help from a third party. It may be appalling content in some instances. We cannot however be complacent that their experience will have no impact on our work. Within a few short years some of these students will be faculty.

Legacy publishers are already competing with the fast response of this youngest group. Newspaper publishers are evolving as they are create print and Web delivered materials to extend the reach of their product and minimize competition between formats. Price Waterhouse[6] reported that 25 million people logged on to the Internet in 2000 while watching television—up from 18 million in 1999, and television station Web sites serve as adjuncts to provide supplementary information to these viewers. Each format thus performs independently to its strengths; print is fixed and has portability; the Web can be continuously updated with breaking news information.[7] It can be expanded beyond the print to provide as comprehensive an information tool as desired. It can be exceedingly responsive and changeable, yet still be archiveable and retrieveable.

Like it or not we are preparing for that generational wave of 22 year olds and younger, those born in 1982 and after, for whom a multi-faceted online presence is simply an expected and essential component necessary to the accomplishment of the work and living of their daily lives.

With these demographic, technical, and business trends it is quite reasonable to expect the content of an institutional repository to vary in ways that may be quite different from that of print era legacy formats. You will hear today and tomorrow about each organization’s strategies to acquire content. Once implemented, the path for adding content will also vary, depending on how individuals and organizations respond. Two things are certain, the content will be in digital format and is intended, for the most part, to be globally shared. Perhaps a third is even certain; there is likely to be more of ‘it’ than print based legacy processes can manage.

Therefore, libraries must engage in the institutional repository effort to not only position operations for alternatives to costly journals but also to prepare for the unknown and participate and collaborate with the emerging generation of new scholars for whom digital media are the norm rather than the exception. No matter what form the scholarly output takes, stewardship is necessary for protection of and access to the scholarly record.

Opportunities abound. Grey literature such as symposia and workshops, performances, demonstrations, presentations, syllabi, data sets and perhaps even blogs are all related to the intellectual life of universities. There may well be formats as yet unknown that will want to be retained. In order to foster new developments that allow for creativity, not only on the part of the authors but also on the part of the IR stewards, Cliff warns against policies that might be too prescriptive, too confining and rigid or too inflexible.

The Web is about sharing and computational tools. The sharing of metadata and content leads to advancements in what networked computing can achieve. We’ve certainly seen this with Google and other public search engines and now in Google Scholar, the latest iteration of Web development. Our role, as institutional repository developers and stewards, is to make content available over time in as reliable, consistent, and as well-structured a manner as possible so that computational solutions for discovery, data mining and visualization techniques for example can also flourish. At its best the institutional repository presents an infrastructure standard for scholarly output stewardship that an institution can be reasonably expected to maintain. Therein lies the future.

So you see, this is not just about scholarly publishing, open access or alternatives to scholarly publishing. The institutional repository has the potential for a much higher-level responsibility and has an opportunity to play an important part in the much deeper and more comprehensive changes that are taking place in the modes of scholarly communication. The ultimate purpose of libraries, after all, is to provide better access to all academic information and generally to facilitate scholarly communication and research and only secondarily to help protect our institutions against the debilitating effects of upward spiraling costs for obtaining the very knowledge that we originally produced but can no longer afford to retrieve.

A number of our speakers today were also at last month’s SPARC Conference on Institutional Repositories in Washington DC, a meeting in which attendees from all over the world also participated. There were speakers from Japan, China, Australia and India, along with Canada, the U.S. and European countries. As we see the international scope of this work is inspiring and is now attracting a critical mass of countries, institutions, authors, disciplines and content that assures its ultimate utility. It is very exciting to be part of this worldwide collaborative enterprise.

Today’s program starts with a walk through the achievements of six institutional repository’s over the last five years, starting with the Caltech E-prints implementation in 1999/00. We will follow a chronological order to help illustrate the evolution of the institutional repository and to acknowledge differences.

It is our plan that, as you listen to each speaker tell the story of their repository, the development of strategies, issues and solutions will be revealed, providing you with practical perspective on your own situation and goals for what is possible.

Please do ask questions to clarify points. Ample time has been allowed.

Tomorrow, more time will be given to standards, recruiting content, policies, and gatekeeping. A high degree of interaction is our goal for this conference so that you can take away both needed specifics and inspiration. Speakers are available during breaks for questions and demonstrations.

Throughout the sessions the speakers will clarify those aspects or decisions that are unique to each institutional situation as well as those issues that differ among the institutions and, finally, to contrast these to features that are necessary and need to be consistent, or at least consistently addressed, for an effective institutional repository.

Your presence here today is a show a commitment, the first and main goal, even if the scope is modest and small. Success will not come all at once; it will be measured incrementally. Not all questions can be answered and problems solved before embarking on an institutional repository. Trust and optimism that problems will be resolved and solutions will be found are among the reasons we gather for conferences. No one is alone and our work is interdependent. Let us all pledge to provide communal help and support until world-wide utility is achieved.

Chinese saying: “Be not afraid of growing slowly, be only afraid of standing still.”

[1] Geoff Watts, Crusaders for a truly free flow of ideas, Times Higher Education Supplement, January 5, 2004. Excerpted by Peter Suber Jan. 10, 2004 in the Newsletter for Open Access. http://www.earlham.edu/~peters/fos/2004_01_04_fosblogarchive.html

[2] Stevan Harnad, Scholarly Journals at the Crossroads:
A Subversive Proposal for Electronic Publishing. June 27, 1994,

[3] This phrase was formally and deliberately applied in the 1997 conference of that name at Caltech.

[4] Lynch, Clifford.”Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age.” ARL Bimonthly Report 226. February 2003.

[5] Northeastern Univ. is investigating the development of an IR within their Integrated Library System. Also Proquest has launched a new service, Digital Commons.

[6] Winkler, Peter. “Convergence: Past, Present, Future.” Price Waterhouse Coopers Global. Executive Perspectives. July 2002.

[7] Greenspan, Robyn. “No threat to newspapers.” ClickZ Network. January 16, 2004.