British Library Research & Innovation Report 103

The Bradford OPAC 2

Managing and Displaying Retrievals from a Distributed Search in Z39.50

F.H.Ayres, L.P.S.Nielsen, M.J.Ridley

Department of Computing

University of Bradford

British Library Research and Innovation Centre 1998

The opinions expressed in this report are those of the authors and not necessarily those of the British Library.

RIC/G/342

ISBN 0 7213 9710 8

ISSN 1366-8218

British Library Research and Innovation Reports are published by the British Library Research and Innovation Centre and may be purchased as photocopies or microfiche from the British Thesis Service, British Library Document Supply Centre, Boston Spa, Wetherby, West Yorkshire LS23 7BQ, UK.

Abstract

This report describes work on the BOPAC2 project, funded by the British Library Research and Innovation Centre, from September 1996 to January 1997. The system is a World Wide Web front end that allows simultaneous access to a number of library catalogues via Z39.50. The system is designed to make access to large and complex retrievals simpler, similar records are clustered together and retrievals may be sorted in a number of ways and by different criteria. The design, development and evaluation of the system are described along with suggestions for future work.

Acknowledgements

We would like to acknowledge the help of the following people:

· Martin Nail, our project officer at British Library Research and Innovation Centre

· Our Advisory Committee: Micheline Beaulieu (City University), Rodney Brunt (Leeds Metropolitan University), Chris Dance (British Library), Ashley Sanders (COPAC).

· Adam Dickmeiss at Index Data for his help with the installation of the Europagate WWW-Z39.50 Gateway system.

· Barbara Tillett, Library of Congress, for her continued support.

· All those libraries who allowed us access to their Z39.50 targets, and their system librarians for help in getting them configured properly: especially Polly Dawes, Systems Librarian, Bradford University.

· And finally, the many people those who provided feedback on the system, particularly: Allyson Carlyle, Mike Heaney, Shelly Vellucci, Martha Yee.

Table of Contents

Abstract 2

Acknowledgements 4

1. Introduction 7

1.1. Management Summary 7

2. Background 9

2.1. Related Items and Bibliographic Relationships 9

2.2. Large retrievals and Complex retrievals 9

2.3. The BOPAC2 Project 10

3. BOPAC2 Design and Development 15

3.1. Use of Existing Z39.50 Packages 15

3.2. Bradford OPAC 1 15

3.3. Overall System Architecture 15

3.4. The Europagate WWW-to-Z39.50 Gateway 16

3.5. Availability and Capability of Z39.50 Targets 17

3.6. Java Applet Design and Development 20

3.7. HCI issues 22

3.8. Testing and Refinement 23

4. Large and Complex Sets in Z39.50 24

4.1. The Networked Environment 24

4.2. The Z39.50 Approach 24

4.3. Changing the Query 25

4.4. Choosing Specific Records with Z39.50 25

4.5. Other Facilities for Reducing the Quantity of Material Transmitted 29

4.6. Modelling the BOPAC2 approach in Z39.50 29

5. Using BOPAC2 32

5.1. Search and Retrieval (Europagate software) 32

5.2. Viewing Retrievals with the Java applet 33

6. Evaluation and User Feedback 35

6.1. Comparison with other OPAC’s 35

6.2. Usage 36

6.3. Evaluation with Testers and Feedback Questionnaire 36

7. Conclusions 42

7.1. Clumps 42

7.2. Further Work 42

8. References and Links 44

9. Appendices 47

1. Introduction

The aim of the BOPAC2 project was to investigate the issues in managing large and complex retrievals received via Z39.50 searches including searches of multiple databases. The Project aimed to build on the work done in the BOPAC1 project. In BOPAC1 retrievals were organised as "manifestations" of particular works, that is all the different versions of "Bleak House" by Charles Dickens or "Chemical Engineering" by Coulsen and Richardson would be grouped together. In BOPAC1 such related records were grouped together in the system's test database. In BOPAC2 since the records would be received from databases as a result of queries sent using the Z39.50 protocol it would not be possible to group records in advance as had been done in BOPAC1. Records would instead have to be "clustered" together as they were received. This meant that the BOPAC2 system would have to deal with the consequences of differences in cataloguing practices between Z39.50 servers. It also meant that the system would have to be more flexible than BOPAC1 since it was clear that not all differences between records could be overcome automatically.

The Z39.50 protocol for communicating with databases provides a uniform means of querying remote databases. This is a general purpose protocol, but its use with bibliographic databases for the transfer of MARC records has been the leading application area in the protocol's development. It has commonly been seen as a means by which one can query a remote database but using a more familiar local interface. Z39.50 server software is now often being provided with modern library database systems.

This means that many catalogues are now much more easily accessible. The problem remains of finding an effective way to search them.

The system was originally planned to be a PC based Z39.50 client with a graphical front end, developed from that built for BOPAC1. The growth of the World Wide Web and Java as a powerful programming language for Web applications led the Project to utilise the general design features and lessons of BOPAC1 but use them within a Web based application to make it more widely available.

Making BOPAC2 a World Wide Web application has opened it up to a wider audience than is possible with many research projects. The system has been announced on a number of mailing lists and via a number of different Web sites. It has also been demonstrated to a number of groups. The system will remain available for use after the completion of the Project and a version of this document will also be available on the WWW.

These are available as links from the BOPAC Home Page [1]

1.1. Management Summary

The BOPAC2 Project was initially funded by BL RIC for a 12 month period from 1st September 1996 to 30th September 1997. The Project was later granted an extension to 31st January 1997. The Project team was M.J.Ridley (manager), F.H Ayres (part-time research fellow) and L.P.S.Nielsen (research assistant, full time Sept 96 -Sept 97, part time Oct 97 - Jan 98).

The Project's aim was to investigate the issues of large and complex retrievals from Z39.50 searches. The Project's workplan was based on developing work and ideas from the BOPAC1 project. The original workplan envisaged a sequence of: Surveying existing Z39.50 clients; System design and development for retrievals from a single target; Testing and evaluation of single target retrievals; System design and development for retrievals from multiple target; Testing and evaluation of multiple target retrievals. The original workplan was aimed at the development of a PC based client system with a graphical user interface, on the lines of that from BOPAC1. Although a subsidiary project task was the investigation of alternative architectures for the system.

In the first months of the Project, whilst evaluating existing Z39.50 software it became clear that a number of important developments were taking place that the Project needed to take account of. These were the growth of the WWW and in particular library catalogues and other databases (such as search engines) availability on the Web. Allied to this was the appearance of Java as a powerful tool for developing Web based applications. A specific development was the release of the Z39.50 - Web gateway software from the Europagate project which supported retrievals from multiple targets. Previously multiple target support had not been easily available and had hence dictated the workplan.

In light of the developments above, which are explained in detail in other sections of the report, a revised workplan was developed. This entailed modification of the Europagate software, so that single and multiple targets would be supported from the start. And the system was to be built in Java, which would enable a system with the functionality originally envisaged to be available via the WWW. This would mean that the system could be made much more widely available for testing. These revised plans and progress on them were presented to the advisory committee in early 1997.

By Spring 1997 a working version of the system was made available on the WWW, but its URL was not made public. This enabled the Project to elicit feedback from a number of interested experts with particular library and cataloguing expertise. This was input into the system development process until Autumn 1997. From Autumn 1997 till the end of the Project, further system development, apart from bug fixes, was suspended to provide a fixed basis for testing and evaluation. In this phase, the system was made public by announcing it on a number of mailing lists, newsgroups and Web sites and an online questionnaire provided. During the course of the Project we had been experimenting with and monitoring a number of Z39.50 targets.

The set of targets was also kept constant during the test period. One important factor in the original workplan had been testing with users at Bradford. To allow this to take place Z39.50 software had been installed on the Bradford University Library system at the end of 1996. However this system was not in a satisfactory operational state till Summer 1997. The Project extension allowed us to have an extended period of testing with Bradford users. This was done with a tailored front end making the Bradford and Leeds University libraries and British Library Document Supply Centre catalogues available, a similar front end was also installed for Leeds users.

The system has been widely used in the course of the Project, with user feedback from USA, Australia, and Japan as well as across Europe and will remain in place and operational after the end of the Project. In the course of the Project, the system was demonstrated at a number of events including the Conference on Principles and Development of AACR in Toronto. The Project team also met and corresponded with a number of other projects working in similar areas such as ONE, UNIVerse and Riding projects.

2. Background

The Bradford OPAC 2 (referred to as BOPAC2) is the successor to Bradford OPAC 1 [2]. BOPAC 1 used a small demonstrator catalogue of records obtained from the bibliographic utilities to illustrate a new kind of bibliographic control based on the idea of “manifestation sets”. BOPAC2 is concerned with how information retrieved from remote catalogues via the Z39.50 Information Retrieval Protocol.

2.1. Related Items and Bibliographic Relationships

The Project is concerned with the bringing together of related items in the catalogue display. Since the days of Lubetzky and the Paris Principles it has been argued that catalogues should bring together items related to the same work or the same author [3]. However the question of what actually constitutes a work is a difficult one and has been discussed at length over the years. The Anglo-American cataloguing codes [56] have never formally defined what a work is, but have incorporated various rules to deal with specific cases. A working definition of a work can be stated in terms of the main entry, but the rules about the main entry are ambiguous, which make it difficult for OPAC’s to fulfil the second objective of the catalogue as defined by the Paris Principles and to collocate related items. Revisions to cataloguing practice have made the situation worse by emphasising the title main entry over uniform title [5], [6] and leaving the use of uniform title optional.

There is also the argument that the general idea of the second objective (i.e. the bringing together of related items) is even more important now than it was when it was first posited. The ever-growing size of catalogues, the globalisation and combining of union catalogues, new media, and the networked information infrastructure which facilitates distributed electronic documents; all these factors make it all the more important for the user to be able to identify related items [8]. A number of different bibliographic relationships have been identified [9] any of which may be of relevance to the end-user looking for related items. These relationships have been expressed within the cataloguing rules in various different ways [10]. In addition, there is a great deal of bibliographic relationship information buried in the general notes tag (500). Thus the problem of machine-extraction of bibliographic relationship information from the current stock of MARC records is a difficult one. This has lead to calls for a fundamental re-appraisal of the catalogue and the cataloguing process with a view to the incorporation of effective linking information into the catalogue record ([11], [12], [13], [15], [16] and [17]). The ambiguities and changes in cataloguing practice with respect to collocation make it difficult for existing catalogues to fulfil the second objective as it was surely intended [14].

Meanwhile the ground beneath our feet is shifting as the Z39.50 information retrieval protocol begins to encroach into the world of library catalogues. Until recently Europe (including the UK) has lagged behind the USA in the implementation of Z39.50 [53] but it is beginning to catch up now and there are a number of Projects taking place [55]. In many cases Z39.50 is being installed as a means of inter-operating catalogues to create a distributed library in which the constituent catalogues can be searched in tandem [54]. The current generation of Z39.50 targets operating in the UK have few if any facilities for collocating their retrieval set. Version 3 of Z39.50 does allow the retrieval to be sorted, which at least brings it up to roughly the same level as most conventional OPAC’s, but no further.

2.2. Large retrievals and Complex retrievals

Another problem with OPAC displays is that of large retrievals, a problem likely to be exacerbated by the encroachment of the Z39.50 information retrieval protocol. In widening access to remote catalogues Z39.50 will inevitably lead to larger and larger retrievals. The solution to the problem of large retrievals is obvious: the retrieval must be reduced or organised in such a way that extraneous material can be disregarded. The big question is how. Plus, in a system where the “front-end” and the “back-end” are separated (as with Z39.50) another question is where: back-end or front-end.