A Proof-Of-Concept Model

FRBR IN RDF

A proof-of-concept model

Casey Mullin

Lala Hajibayova

Scott McCaulay

S363 – Semantic Web

December 13, 2008

Introduction

FRBR (Functional Requirements for Bibliographic Records[i]), developed by IFLA in 1997, is a conceptual model for the bibliographic universe. As such, it models bibliographic metadata as a set of ten entities, their properties, and their relationships. The entities include the products of intellectual and artistic creation and their physical embodiments (Work, Expression, Manifestation and Item), those responsible for their creation or realization (Person, Corporate Body), and those things which are the subjects of works (Concept, Object, Event and Place). Though this model has been fodder for much discussion, debate, and some research within the library metadata community since its release, it has seen no widespread implementation. Its terminology is incorporated within RDA (Resource Description and Access[ii]), the forthcoming metadata content standard which is slated to supersede AACR2 (Anglo-American Cataloging Rules, 2nd ed.); still, it has yet to be determined how this model will inform the continuing development of metadata structure standards.

FRBR, as an entity-relationship model, lends itself quite well to ontology representation. Indeed, the inspiration of this project was an existing RDF schema developed in 2005 by Ian Davis and Richard Newman[iii]. This schema defines FRBR entities as both RDF and OWL classes, and includes superclasses to represent the grouping in the original report (Endeavor is the superclass of the entities Work, Expression, Manifestation and Item). Relationships between entities, as defined in the FRBR report, are represented here as object properties. Data properties are conspicuously absent, as are examples of instances. We used these lacunae as a warrant for the present project.

Schema Development and Instance Creation

Casey A. Mullin

In order to have a manageable ontology with which to model a small corpus of bibliographic data, we needed to derive a simpler schema, with a view to allowing coherent demonstration, and streamlined validation. To do this required the elimination of several major aspects of the Davis/Newman schema. The OWL and SKOS-defined elements were removed, as they added a level of complexity to the schema which we did not deem necessary for our proof-of-concept model. For the same reason, we also omitted those classes representing “Group 3” entities (Concept, Object, Event and Place) from our derived schema[iv]. To mitigate the absence of FRBR-defined attributes, we added these in, represented as data properties. Fortunately, these have already been registered in the NDSL Registry by the DCMI RDA Task Group[v]. Thus, we were able to incorporate the URIs from this registry in our property definitions. Finally, we defined a few data properties locally, to accommodate our domain-specific corpus of data (described below). These used the namespace

To encode instances, we used a small set of MARC bibliographic records, representing musical sound recordings. Such resources are well served by the FRBR model, as they often include multiple musical works, by multiple creators, realized by multiple performers, and possessing diffuse properties. Current cataloging practice conflates all such metadata into one surrogate record. The FRBR model, as expressed in RDF, allowed us to make more granular statements about these distinct entities (e.g. a Work has a creator, a date of composition and a form; an Expression has a date and realizers [performers], a Manifestation has a date of publication and a publisher; etc.). The primary object properties link these entities together in a logical way (e.g. a Work is realized in an Expression, which is embodied in a Manifestation, which is exemplified by an Item). Figure one shows a set of instances which demonstrate these properties[vi].

Figure 1 – Instance Examples

<rdf:Description rdf:about="
<rdf:type rdf:resource="&frbr;Work"/>
<rda:preferredTitleForTheWork>Sonatas, piano, D. 960, B♭major</rda:preferredTitleForTheWork>
<rda:variantTitleForTheWork>Sonatas, piano, no. 21, B♭major</rda:variantTitleForTheWork>
<rda:formOfTheWork>Sonata</rda:formOfTheWork>
<rda:dateOfWork>1828</rda:dateOfWork>
<rda:mediumOfPerformance>piano</rda:mediumOfPerformance>
<rda:numericDesignation>D. 960</rda:numericDesignation>
<rda:numericDesignation>no. 21</rda:numericDesignation>
<rda:key>B♭major</rda:key>
<frbr:realization rdf:resource="&frbr;Expression/3"/>
<frbr:creator rdf:resource="
</rdf:Description>

<rdf:Description rdf:about="&frbr;Expression/3">
<rdf:type rdf:resource="&frbr;Expression"/>
<rda:dateOfExpression>1986</rda:dateOfExpression>
<rda:contentType>Recorded sound</rda:contentType>
<frbr:realizer rdf:resource="
<frbr:embodiment rdf:resource="
</rdf:Description>

<rdf:Description rdf:about="
<rdf:type rdf:resource="&frbr;Manifestation"/>
<rda:title>Sonate en si-bémol majeur, D 960 [sound recording] ; 3 piéces pour piano, D 946</rda:title>
<rda:placeOfProduction>Paris, France</rda:placeOfProduction>
<rda:publisher>Harmonic Records</rda:publisher>
<rda:dateOfCopyright>1986</rda:dateOfCopyright>
<rda:identifierForTheManifestation>H/CD 8610</rda:identifierForTheManifestation>
<rda:carrierType>Compact Disc</rda:carrierType>
<frbr:exemplar rdf:resource="
</rdf:Description>

<rdf:Description rdf:about="
<rdf:type rdf:resource="&frbr;Item"/>
</rdf:Description>

Once the bibliographic data is represented in this way, it can be queried and reasoned with in very complex ways (see section below on SPARQL queries). It is our assertion that RDF allows for effective expression of this data, because of the great complexity of bibliographic metadata. Indeed, using selected metadata elements from just five MARC records, we created an RDF graph which contained 325 triples. Admittedly, encoding new metadata in this way, let alone migrating legacy metadata is time consuming and costly. Further use into Semantic Web technologies by the library metadata community, however, may help mitigate this cost. Eventually, perhaps all bibliographic entities can be represented by URIs, which can be used to make statements in a streamlined fashion.

SPARQL Queries

Lala Hajibayova

The following queries were designed to demonstrate the kinds of data that can be retrieved using this model:

Figure 2 – SPARQL Queries and Result Sets in Jena

Retrieving formOfTheWork “Sonata”

SELECT ?formOfTheWork

WHERE

{ ?formOfTheWork < "Sonata" .

}

Retrieving dateOfWork with mediumOfPerformance “piano”

SELECT ?mediumOfPerformance ?dateOfWork

WHERE

{ ?x < ?dateOfWork .

?x < "piano" .

}

Filter works with date –“1800”, wherein the flag "i" means a case-insensitive pattern match is done.

SELECT ?date

WHERE

{

?date < ?dateOfWork .

FILTER regex (?dateOfWork, "1800", "i") }

Retrieving PreferredTitleForTheWork with matching variantTitleForTheWork

SELECT ?preferredTitleForTheWork ?variantTitleForTheWork

WHERE

{

?title < ?preferredTitleForTheWork .

?title < ?variantTitleForTheWork .

}

Retrieving PreferredTitleForTheWork with OPTIONAL requirements of retrieving variantTitleForTheWork

SELECT ?preferredTitleForTheWork ?variantTitleForTheWork

WHERE

{

?title < ?preferredTitleForTheWork .

OPTIONAL { ?title < ?variantTitleForTheWork }

}

Web Service

Scott McCauley

One of the goals of our project was to produce an end to end solution for the deployment of the FRBR bibliographic data in RDF form. This would require a user interface to allow the selection of criteria and the construction and execution of queries. Ideally this web application would be fully automated, and could build query screens dynamically based on current RDF data. This data would be presented to the user in some simple and easily understood format, such as drop down boxes. The application should be smart enough to maintain consistency within the query; when a selection is made, choices for other fields should be limited to those which exist with the current selection.

User selections from the query screen would be used to create SPARQL queries which would be executed and results returned. The results of the query should be available for display and should also be available for reuse in some form, if only copy and paste. While this vision proved slightly too ambitious for the time available for the project, we were able to build and demonstrate a prototype web application with some of the desired functionality, using a rapid application development tool called IntraWeb. The application was populated with the RDF test data, and showed how the query and results screens could function.

Figure 1- Example Query Screen

Figure 2 - Example Results Screen

While the prototype application is fairly primitive, it does serve to illustrate how the RDF and SPARQL technology could be utilized to make the bibliographic data usable and accessible. The significant task that would remain to be done to make this application usable would be the development of the routine to parse the current RDF data to extract the elements to build the query screen dynamically, and the generation of the SPARQL query from user-selected criteria. Neither of these would be particularly onerous, and the development of this concept into a full working application is quite feasible.

[i] Final report at:

[ii] Full draft for constituency review at:

[iii] Schema located at:

[iv] Our derived schema located at:

[v] Registered properties at:

[vi] Full instance document located at: