Proceedings Template - WORD s18

Metadata on the Web: on the integration

of RDF and Topic Maps

R. Gentilucci

Department of Computer Science
University of Bologna
Mura A. Zamboni, 7
40127 Bologna BO

M. Pirruccio

Department of Computer Science
University of Bologna
Mura A. Zamboni, 7
40127 Bologna BO

F. Vitali

Department of Computer Science
University of Bologna
Mura A. Zamboni, 7
40127 Bologna BO

WWW 2003, May 20-24, 2003, Budapest, Hungary.

ACM xxx.

ABSTRACT

Metainformation provide an additional layer of abstraction on web documents that can be used for sophisticated applications relying on the precise semantic characterization of their content. Two leading standards, RDF and Topic Maps, compete as the model through which to express metadata. These two models are sufficiently different as to make back and forth conversion a difficult and imprecise task. In this paper we introduce META, a set of integrated tools that help in editing, navigating and converting metadata expressed in either language.

Categories and Subject Descriptors

H.5.4 [Information Interfaces and Presentation]: Hypertext/Hypermedia: Navigation

General Terms

Languages

Keywords

RDF, Topic Maps, metadata conversion tools, metadata editing tools, metadata navigation tools

1. INTRODUCTION

The Semantic Web [1] is an on-going large-scale effort to improve the current architecture of the World Wide Web by adding a semantic infrastructure to web resources that can be used for sophisticated data-oriented applications. At its basis we can identify metadata, or information about information, that unambiguously specify machine-understandable facts about web resources.

Through metadata we can select something (most often a web-accessable resource such as a document or process) and associate it some information, e.g. a description, classification data or any other attribute. With metadata we can also establish explicit relationships between the resources we describe. This allows us to explicitly define the nature of attributes and relationships in terms of the metadata language itself, so that it is possible to add semantic to the things being described. Furthermore, since the semantic layer that characterize the Semantic Web is built apart from the information resources that are linked by metadata, the addition of semantic contents can be performed without any editing of the original resources.

Even though it is possible to create metadata documents semi-automatically, a more precise approach requires human intervention. Much of the information that can be usefully specified for a resource simply cannot be extracted without some kind of human interpretation. Also the metadata that is to be recorded about a resource is often derived from a vocabulary of interesting categories that are relevant for subsequent processes. These vocabularies, called ontologies, can be required to adhere to standards that can only be applied by humans. Once this necessarily intelligent work has been performed, one of the biggest claims that the Semantic Web makes is that by formalizing the expression of the semantics of the information every automatic application will be able to manage, understand and reason on it.

There are two competing models by which we can express metadata: RDF (Resource Description Framework) [2] is a W3C recommendation and by design is meant to lay at the basis of the W3C's vision of the Semantic Web; Topic Maps [5] is the 13250 ISO standard, and although developed independently of the W3C it has several properties that make it an interesting alternative to RDF. An important characteristic of both models is the fact that they allow a serialization in XML, which makes metadata specifications web resources themselves, amenable of further commenting and descriptions by additional metadata layers, and so on. The model is therefore repeatable and stackable, so that a complex web of descriptions, descriptions of descriptions and so on can be fruitfully generated.

The two languages are rather different even in their basic concepts, and the choice of one model can have far-reaching consequences both on the kind of statements that can be expressed on a resource, and, more importantly, on the long-term usefulness of these statements. Very few tools exist so far that provide conversion from and to each model, and most of them suffer of serious drawbacks. Furthermore, the abstractness and complexity of the topic surely have not worked towards the generation of easy-to-use tools for the generation and examination of metadata documents. For instance editors for metadata collections are mostly either limited in scope (to a single vocabulary, for instance), or do nothing to hide the intrinsic complexity of the syntax of the chosen syntax.

To partially overcome these difficulties, we set to developing META, three integrated tools for the coherent management of metadata, both for RDF and Topic Maps. META is composed of a metadata editor, of a metadata navigator, and of a bidirectional converter from RDF to Topic Maps and vice versa.

In this paper we present our approach to the bidirectional conversion of RDF and Topic maps, and show how the use of schemas and the adoption of Published Subject Identifiers (PSI) in Topic Maps and standard predicates in RDF can lead to a painless integration of the two languages. This integration is also instrumental in the creation of a single editing tool and a single navigation tool that can be used for metadata collections expressed in both languages.

In the next section we summarize the characteristics of both RDF and Topic Maps, and summarize a few related works in the field of conversion and editing of metadata collections. In section \ref{meta} we introduce our approach to the conversion of RDF into Topic Maps and vice versa. The following subsections are dedicated to the description of the tools. Section \ref{conclusion} contains some conclusions and a possible evolution path of our research.

2. METADATA STANDARDS

2.1 RDF and RDFS: an introduction

RDF (Resource Description Framework) is a W3C recommendation for the expression of metadata on any kind of targets, from real life objects to abstract entities, but particularly useful for Web resources such as documents or server-side processes. The fundamental model of RDF is composed of three concepts: Resources, Properties and Statements. A resource is what is being described through metadata, and it is identified by an URI. A property is an attribute we want to associate to the resource. The statement is a triple composed of the resource we want to describe, the property and its value. Property values can either be literals (i.e., strings) or other resources, which can possibly introduce in further and more abstract levels of indirection.

For instance, the sentence “Mario is the author of http://www.mario.org” is a statement in which “http://www.mario.org” is the resource that we are describing, “being the author of” is the property we are associating to it and “Mario” is the literal value of this property. We can graphically express this statement as shown in Figure 1.

Figure 1. A simple RDF model

This graph can be serialized in XML. Two equivalent syntaxes are presented in [3], an abbreviated and a full syntax, although in this paper we will not make this distinction.

rdf:Description rdf:about="http://www.mario.org">

<s:author>Mario</s:author>

</rdf:Description>

Other important syntax elements in RDF are containers, i.e., structures containing collections of resources. Statements can be made to refer to collections, or collections can be used as values for properties. The RDF model defines three types of containers: bags, i.e., sets with repeatable elements, sequences, i.e., sets with a specified order on their elements, and alternatives, i.e., sets among which to extract one value as the relevant one.

RDF Schema (or RDFS for short) is a W3C working draft [2] aimed at defining a description language for vocabularies in RDF. An RDF schema (not to be confused with XML Schema) defines classes of resources and types of relationship that can be used in RDF statements. RDF Schema makes it possible, for example, to validate a value of a property or to constraint its range and domain of applicability.

Using RDF Schema we can add other properties to the metadata defined in the previous example, for instance defining the web page as class, adding some constraints on it (the values for the author relation need to be literals, while the domain of values for it are web pages) and applying this class as type for the metadata about http://www.mario.org.

<rdf:Description rdf:ID="webpage">

<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>

</rdf:Description>

<rdf:Property rdf:ID="author">

<rdfs:domain rdf:resource="#webpage"/>

<rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>

</rdf:Property>

<rdf:Description rdf:about="http://www.mario.org">

<rdf:type rdf:resource="#webpage"/>

<s:author>Mario</s:author>

</rdf:Description>

2.2 Topic Maps: an introduction

Topic maps, defined by the ISO 13250 standard, is a model for describing knowledge structures and associating them with any kind of information resources. The most important concepts of Topic Maps are Topics, Occurrences and Associations.

Topics represent any kind of thing we are interested in describing: in order to associate metadata to some entity (a web page, a book, a person, etc.), we create a topic and we specify as its subject a URN suitable to identify this entity. In order to associate attributes to our subjects, we use occurrences structures to specify name-value pairs to the topics. Through associations we can define relationships among different topics. Associations are inherently bidirectional: in order to distinguish the different roles of each member in the association we need to create topics whose sole purpose is to represent a so-called association role type.

In fact, the real means through which we add semantics in Topic Maps is the type system. For instance, we can make any of the previous concepts (topics, occurrences and associations) instances of classes, which are nothing but other topics purposely created for typing. We can also define entire hierarchies and therefore define implicit relationships among topics (or associations, occurrences) that are instances of related classes.

To separate classes from instances, although there are no formal rules to enforce this issue yet, we can use Published Subject Identifiers (PSIs): these are subjects whose semantics, established by the organizations promoting the standard, is well known and suitable to type topics with class roles. So, for example, beside defining the hierarchy of topic types, we can also define any occurrence type that can be used to specify any kind of attribute that can be applied.

Topic maps does not have any XML serialization language specified. One of them, though, XML for Topic maps (or XTM) seems to be the best candidate to become it. The sentence “Mario is the author of http://www.mario.org” can be written in XTM as follow.

<baseNameString>Person<baseNameString>

</topic>

</baseName>

</topic>

</instanceOf>

<baseNameString>Mario</baseNameString>

</baseName>

</topic>

</instanceOf>

</subjectIdentity>

</topic>

<baseNameString>Author</baseNameString>

</baseName>

</topic>

</instanceOf>

</roleSpec>

</member>

</roleSpec>

</member>

</association>

We have defined two topic types tt-person and tt-webpage, used to add type to the topics mario and page: by this we state that the topic corresponding to the string Mario is describing a person and the topic whose subject is http://www.mario.org represents a web page. Then we define the association type at-author, used to add a type to the association between mario and page. Every member of this association specifies the role it assumes. In this case the role and the type of the topic are the same.

2.3 Converting between RDF and Topic Maps

Both models are suitable to solve the knowledge management problem, but the idea that inspired them was different. RDF has been developed with the Semantic Web in mind, while Topic Maps was born as a means to create a practical way to build indexes of information resources. Nonetheless, both standards try to achieve a practical and compact way to describe and relate generic entities.

In particular, both RDF and Topic Maps:

· Allow the definition of abstract and concrete entities.

· Have a type system to build hierarchies.

· Allow to define semantic relationships between entities (more precisely, they allow to define typed relationships).

· Allow a labeled-arc graph representation of relationships and classes.

The most significant difference between the two is the expressive power of the respective constructs: while Topic Maps offers a more sophisticated set of basic constructs to use in the definition of metadata, RDF is easily extensible: the issue is therefore between using the predefined set of predicates, as opposed to having each user redefine new ones for the same purposes.

The problem of finding a mechanism that allows us to go back and forth between the two languages is therefore an important one, and needs to be studied carefully. In the next section we provide a short introduction to our approach, while here we list a few existing tentative in the literature.

[19] is the first attempt to offer integration between the two languages. The strategy followed is very straightforward, as it is based on the definition of PSIs to represent RDF concepts for the RDF to Topic Map conversion, and the definition of RDF predicates to represent Topic Maps constructs in the other direction. The definition of the mapping is not completely defined, as it leaves unanswered issues of some importance for a practical use of the mechanism.

In [20] the authors define a mapping model from Topic Maps to RDF based on the “Topicmaps.net's Processing Model for XTM 1.0” presented in [23], which defines a set of rules for processing Topic Map documents in order to reconstitute the meaning of the information they are intended to convey to their recipients. Basically [20] defines an RDF schema that represents the Topic Map model with RDF statements. The approach, as claimed by the authors, is complete and reversible. The paper does not mention anything about the other direction of conversion.

Also [21] is based on [23] and it introduces some of the ideas we have exploited in our work, such as the possible representation of Topic Map association, scoped names and occurrences with resources, instead of properties as they are usually conceived. Neither this work, on the other hand, takes into account the other direction of the mapping.

The common drawback of all the works here listed is in our opinion the rather awkward aspect assumed by the documents that come out of the conversion. Even if preserving information means that it is sometimes necessary to expect the applications managing these documents to have some prior knowledge of the conversion schema to work correctly, we found the readability of the documents produced as an important aspect of the conversion process.