PCC SCS/LDAC Working Group on the Work

PCC SCS/LDAC Working Group on the Work

PCC SCS/LDAC Working Group on the Work

Interim Report October 25, 2016

Outline

Introduction

Use Cases:

Facilitation of cataloging

Enhance authority control

Collocation

Providing a work-level display for users

Finding among lending partners (Interlibrary loan)

Creating bibliographic relationships in the catalog

Enabling integration with the Semantic Web

Enabling a researcher to relate works to University faculty

Conclusion

Next Steps

Task Group Members

Introduction

Both the FRBR conceptual model and RDA (the most direct implementation of IFLA’s FRBR final report, and the rules adopted for use by PCC in 2013) define the work in abstract terms:

  • FRBR: “a distinct intellectual or artistic creation.”
  • RDA: “A distinct intellectual or artistic creation, that is, the intellectual or artistic content.”

These definitions tell us little about the expected nature of the FRBR work entity. More detailed information about the work is implicit in the data elements that are included in each of these standards as attributes of the work. These attributes show that the work is defined through creators, literary and artistic forms, and the subject matter of the creation.

What seems to be missing from both FRBR and RDA[1] is an analysis of the actual uses that will be made of the entities and their attributes in the course of library and user workflows. This preliminary report describes use cases that were developed by the working group; these are not exhaustive nor complete, but are intended to stimulate a discussion of these and other cases that we can confidently use for the remainder of our analysis.

Use cases

Use case: Facilitation of cataloging

Story 1: A cataloger has in hand yet another edition of Moby Dick. She knows that she will find the Work information that she needs in the cataloging copy database (read:OCLC). She can import all of the Work statements into her local bibliographic description, and this includes the authoritative entities for author, subjects, and genre. This simplifies cataloging, and increases consistency across libraries.

Story 2: A cataloger works in a linked data cataloging environment and has in hand yet another edition of Moby Dick to catalog. Rather than reinvent the wheel, she searches the Web for existing Work data for Moby Dick and finds Moby Dick (the work) on a library RDF hub service (e.g., hypothetical LD4L Work Entity Service). She looks at the <is described by> relationships, which are translated for human readability to labels, and reviews the descriptive triples and authority data related to the Work from a library she trusts. In her local cataloging interface she creates an <is instance of> relationship between the version of Moby Dick in hand and the Work data in the data store. Alternately, the cataloger may find that none of the work entities available to her are complete because there is one subject heading that she feels must be included. She creates a local work entity that she links to the best work entity available. In her local system she adds the elements that she feels complete the work description. The remainder of the elements are available from the shared data store.

Discussion: While this scenario is very appealing for an oft-re-issued work, it has to be weighed against the fact that most (~70-85%) manifestations are a work with a single expression. On the other hand, the multiple expression works are more likely to be widely held, and therefore cataloged repeatedly. In addition, for many cataloging situations, the work will be used in conjunction with the expression and may not have utility as a separate entity. Statistical analyses that we have of the incidence of works[2] analyze bibliographic databases but do not address either the ongoing cataloging activity, nor do they provide an analysis of expressions.

Using works as entities in cataloging may require development of a new feature for shared authorities development as this information is not covered by current NACO and NAF activities.

Although works may be entities in the cataloging workflow, this does not necessarily mean that they will have an existence in other workflow use cases that are covered in this document. Note also that the work description may not be a “record” that exists as a physical unit but could be a cluster of descriptive elements from which a work description can be selected. Also, within a specific system environment work descriptions may not need to be copied or replicated but could be utilized through linking.

Use case: Enhancing authority control

Story: A cataloger has in hand a book with the author Michael Fitzgerald. A glance at the library’s list of authors shows that there are many authors with that name, some of which have been disambiguated with birth dates. The book in hand does not include that information. However, the authoritative name display includes a link to all previously cataloged titles, and this provides a starting point for the determination of the correct person entity.

Discussion: A change in the model of bibliographic information creates an opportunity for full integration of authority control and bibliographic description. There might no longer be separate “records” for bibliographic and authority data; the identified entity, such as a person entity, would exist only once in the data store linked both to bibliographic descriptions and authority-related data. Relating an agent to all works in which it played a creative role should generate a full ‘bibliography’ that would improve disambiguation. This could reduce the human effort required to make decisions.

Use case: Collocation

Story: A music student comes to the library looking for a copy of Beethoven’s Eroica symphony. The library has a number of different expressions of this work, and numerous manifestations. These have been published under different names in multiple languages, such as “Beethoven's Third symphony (Eroica)”, “Dritte Symphonie, Op. 55 : Sinfonia eroica”, and “Sinfonie no. 3 (Es-dur), Op. 55”. Collocation in the catalog is required to bring these together.

Discussion: Unlike the majority of books, musical works are published/released in endless editions, by numerous performers, in their original orchestration and multiple arrangements and score permutations. Titles for classical music are often weak—usually a genre/form followed by a serial number, and maybe opus numbers. Calling something Symphony no. 5 is like calling a book Novel no. 2. Given a group of manifestations of Beethoven's 5th symphony there are various types of scores (study, parts, score and parts), sound recordings and videos; various editions by various editors, arrangements by various arrangers; and individual movements recorded separately (or removed from a full recording and republished separately), even one that is turned into a song. All these instances are related to each other in that they are instances of Beethoven's 5th symphony, either in part or in entirety.

One of the primary goals of cataloging is the collocation of works. Works and their relationships can create a collocation based on the work that has some of the same functions as the headings in the card catalog (but not the online catalog). When I retrieve items in the catalog each is shown in relationship to other items, from works to expressions, works to related works, etc.

Use case: Providing a work-level display for users

Story: In medium and large catalogs, searches often bring up many entries for the same work, although these may be scattered throughout a search result display depending on the order used. A work-level display would have less redundancy and may make it easier for users to understand what content is being offered.

Discussion: In this use case, works could serve as an organizing unit for user displays, allowing users to “drill down” for more specific library holdings. However, the work itself may not always be the entity level that best serves users; this will vary by content type and by user needs. For texts, an expression level may serve the user better because of the need to specify a language of text and to show the user a title in the user’s preferred language (the “War and Peace” expression rather than the “Война́ и мир” work). For reference books and textbooks, more recent expressions are generally preferred over earlier ones. The studies done for the music catalog Variations[3] indicate that for music materials (which are expected to have a music-specific uniform title), a work-level display was generally preferred by users. In a catalog allowing searches that include a preferred physical format, information about the manifestation would need to be included. For other materials, such as archival collections of photographs, a collection-level display may be prefered to a display of individual works.

As shown in the use case on bibliographic relationships, below, there are key work relationships that could be made evident in user displays, such as relationships between continuing resources, whole/part relationships, and relationships of creative transformation. Each of these relationship types presents different challenges for display and as yet there are no system design models that propose concrete solutions for such displays. That a user display could make use of work-to-work relationships does not necessarily mean that the user will see a work itself as the bibliographic display. How displays and relationships should interact is as yet unknown.

In all cases outlined here, the information from the work entity provides important aspects of the presentation even when the work entity on its own is not the primary user display.

Use case: Finding among lending partners (Interlibrary loan)

Story: A user comes to your library looking for a book that you own but it is checked out (or missing). You want to borrow from another library any manifestation from an equivalent expression that will satisfy the user. Knowing what libraries have the the same work will help your search. In an automated ILL, searching on a work identifier could pull up options for selection.

Discussion: The definition of equivalent depends on the user and the context. For example, the presence of illustrations may or may not be critical in defining an equivalent expression of a primarily textual work. Selection will undoubtedly make use of information describing the work, but it may also often require the expression description to fulfill the user’s need. However, the work-level information will likely be important in guiding the search.

Use case: Creating bibliographic relationships in the catalog

Story: (Providing context to enable better information literacy) A user comes to your library looking for works on a particular topic. After performing a keyword or subject search, links between works enable the user to better understand which works influenced or were influenced by other works; including works citing and cited by other works, derivative works, etc.

Discussion: FRBR encourages the creation of relationships between bibliographic entities, such as work-to-work and expression-to-expression, as well as expression-to-work relationships. These are in addition to the primary relationships between FRBR entities, as shown in the three entity-relation diagrams. Work-to-work relationships include those describing changes in continuing resources (preceding/succeeding work) and bibliographic supplements. Existing continuing resource records routinely include relationship data, recorded at the manifestation level, that can potentially be extrapolated to the expression and/or work level. Other relationships are between adapted and transformed works. Finally, there are whole-part relationships for each FRBR group 1 entity. This has proven to be a very difficult area to model and both the FRBR and RDA groups have issued analyses of this under the topic of “bibliographic aggregates.”[4]

The working group acknowledges that many desired bibliographic relationships are not currently coded in such a way that they can be algorithmically transformed into FRBR relationships as bibliographic data is converted to RDA or BIBFRAME linked data. This does not, however, deny the potential utility of realizing such relationships in the catalog.

Use case: Enabling integration with the Semantic Web

Story 1: While browsing Wikipedia, I learn about a play by Shakespeare that I’ve never read, Cardenio. I’d like to know whether a local library has a copy, without leaving Wikipedia.

Story2: I’m doing a search in the library catalog and one of the books that is retrieved is Lewis Mumford’s The Myth of the Machine. I’ve heard the name, but I realize I don’t know much about the book. I see links to reviews as well as to the Wikipedia article on the work. Clicking on the Wikipedia article opens a side panel where I can see the Wikipedia article without leaving the library catalog.

Story3: [A future use case] I run a Google search on Marie Curie. In addition to surfacing Wikipedia data and several films about Marie Curie, the Google Knowledge Graph also collocates rich resources from across the cultural heritage community by and about her. To generate this display, Google crawls linked data properties of Works (e.g., <hasSubject> Marie Curie or <hasAuthor> Marie Curie) finding archival materials held by the Institut Curie (including manuscripts, audio visual interviews with colleagues, and photographs), photographs from Nobelprize.org, and books and articles from library collections around the world. Minimal but shared data relating to Works facilitates display in the Knowledge Graph and provides entry for users to find more specific Instance and Item data about these resources, such as web accessibility, physical location, format, and more detailed description.

Discussion: Libraries want to be visible to users as they use the web for research. The semantic web has the promise to make the necessary connections between information resources and library holdings. Cultural domains on the semantic web (Wikipedia, MusicBrainz, IMDb, etc.) have models that present creative output at an abstract level that vary in how closely they resemble the FRBR work. To integrate catalog data with this larger environment, libraries need to be able to express the relationship between what they hold and these entities. This sort of integration will require contribution to the set of work descriptions in some cases, such as the linking of VIAF entities to Wikipedia pages, as well as the ability to express relationships between GLAM metadata and data using different models.

Although no mechanism to make these connections exists today, the effort to integrate VIAF identifiers with Wikipedia pages provides an early testing ground. Links are established algorithmically using information available in both systems, resulting in a link between system identifiers. This shows the importance of the metadata for establishing to relationships, as identifiers carry no information themselves and can only be used once relationships have been determined.

Use case: Enabling a researcher to relate works to University faculty

Story: (This is taken from a May 2015 LD4L presentation at Cornell.) A researcher would like to see/search on works by | about | cited by | collected | taught by University faculty in an OPAC or profiles system, to discover works of interest based on connections between people, and to understand people based on their relation to works.

Discussion: This use case anticipates an ability to exploit Linked Data to identify patterns of interest within a given set of individuals--in this case University faculty--and an undefined set of works. The challenges are twofold at least. First, researchers like their datasets to be comprehensive, and while there can be high confidence that the set of University faculty will be comprehensive, at least for a given point in time, there can be less confidence that all their relationships to works will have been tracked down and recorded. Even citations can be notoriously hard to track down and identify unambiguously. Second, mechanisms for linking will often be provided at levels other than that of the work: an ISBN may identify a manifestation, a DOI may identify an article in print and online, but neither will identify the work, and the former will not even identify the expression. However, there is no reason these challenges cannot be overcome.

Summary

The working group sees uses for the bibliographic work. However, in most cases the work alone does not suffice to complete user-serving functions. The work can be seen as the keystone of the bibliographic description, providing essential identification of the intellectual and creative content of the items held by or available through the library. The data provided in the work entity will be used in nearly all catalog functions, and in most functions it will be used in concert with other bibliographic data. It is recommended that studies of the work include the bibliographic context that it supports.

Next Steps

The following is work we hope to report on at PCC Operations Committee Meeting, spring 2017, addressing the use cases described above where they are relevant.