A Normalization Framework for Multimedia Databases

S.K. Chang1, V. Deufemia2, G. Polese2

(1) Department of Computer Science, 6101 Sennott Building,

University of Pittsburgh, Pittsburgh, PA, USA, 15260

(2) Dipartimento di Matematica e Informatica, Università di Salerno,

Via Ponte don Melillo, 84084 Fisciano, Salerno, ITALY

{deufemia, gpolese}@unisa.it

Abstract

We present a normalization framework for the design of multimedia database schemes with reduced manipulation anomalies. To this sake we first discuss how to describe the semantics of multimedia attributes based upon the concept of generalized icons, already used in the modeling of multimedia languages. Then, we introduce new extended dependencies involving different types of multimedia data. Such dependencies are based on distance functions that are used to detect semantic relationships between complex data types. Based upon these new dependencies, we have defined five multimedia normal forms. Then, we have shown how to apply some of them to normalize multimedia databases used in e-learning applications, for which we also analyze the impact of normalization on performances. Finally, we have performed a simulation on a large image dataset to analyze the impact of the proposed framework in the context of content-based retrieval applications.

Index terms --- multimedia database management systems (MMDBMS), anomalies, data dependencies, content-based retrieval.

1. Introduction

In the last decade multimedia databases have been used in many application fields. The internet boom has increased this trend, introducing many new interesting issues related to the storage and management of distributed multimedia data. For these reasons data models and database management systems (DBMSs) have been extended in order to enable the modeling and management of complex data types, including multimedia data. Researchers in this field have agreed on many characteristics on which to base the classification of multimedia DBMSs (MMDBMS). Some of them are [Nar96]:

The data model for representing multimedia information. This should provide effective means to represent relationships among media types.

The indexing techniques used to enable content based retrieval of multimedia information.

The query language. This should allow to efficiently express complex characteristics of data to be retrieved, like multimedia data.

The clustering techniques on multimedia information to enhance its retrieval.

Support for distributed multimedia information management.

Flexible architectures.

The latter is a critical point, because multimedia information systems are in continuous evolution. Thus, it is important that the architectures of MMDBMSs be flexible enough to be extensible and adaptable to future needs and standards. A conspicuous number of MMDBMS products have been developed. Examples include CORE [WNM95], OVID [OT93], VODAK [LR95], QBIC [FSN95], ATLAS [SKR95], etc., each providing enhanced support for one or more media domains among text, sound, image, and video. At the beginning, many DBMS producers would preferably rely on the object-oriented data model to face the complexity of multimedia data, but there have also been examples of MMDBMSs based on the relational data model and on specific, non-standard data models. However, in order to facilitate the diffusion of multimedia databases within industrial environments researchers have been seeking solutions based on the relational data model, possibly associated to some standard design paradigm, like those used with traditional relational DBMSs (RDBMSs). Extensible relational DBMSs have been an attempt in this direction. In the last years DBMS vendors have produced extended versions of relational DBMSs [Ren97], with added capabilities to manage complex data types, including multimedia. In particular, these new products extend traditional RDBMSs with mechanisms for implementing the concept of object/relational universal server. In other words, they provide means to enable the construction of user defined Data Types (UDT), and Functions for manipulating them (UDF). New standards for SQL have been created, and SQL3 [EN03,SQL96] has become the standard for relational DBMSs extended with object oriented capabilities. The standard includes UDTs, UDFs, LOBs (a variant of Blobs), and type checking on user defined data types, which are accessed through SQL statements. Early examples of extensible RDBMSs include Postgres [SK95], IBM/DB2 version 5 [Dav00], Informix [Ren97], and ORACLE 8 [Ora99].

As MMDBMSs technology has started becoming more mature, the research community has been seeking new methodologies for multimedia software engineering. Independently from the data model underlying the chosen MMDBMS, multimedia software engineering methodologies should include techniques for database design, embedding guidelines and normal forms to prevent anomalies that might arise while manipulating multimedia data. In the literature we find some new normalization techniques extending traditional normal forms for relational databases [SG02,AL04,VLL04]. However, many of them focus on specific domains, and no general purpose normalization framework for multimedia is provided. In particular, the technique in [SG02] focuses on the normalization of image databases by partitioning images so as to enhance search and retrieval operations. To this sake the technique aims to define dependencies among image features, which suggest the designer how to efficiently map them into a database schema. The techniques in [AL04,VLL04] focus on the normalization of XML documents.

The normalization of multimedia databases needs to account for many new issues as opposed to alphanumeric databases. Many different types of complex data need to be analysed. In this paper we describe a general purpose framework to define normal forms in multimedia databases. The framework applies in a seamless way to images as well as to all the other different media types. The semantics of multimedia attributes is defined by means of generalized icons [Cha96], previously used to model multimedia languages. In particular, generalized icons are here used to derive extended dependencies, which are parameterized upon the similarity measure used to compare multimedia data. Based on these new dependencies, we define five normal forms aiming to reach a suitable partitioning of multimedia data, and to improve the quality of database schemes so as to prevent possible manipulation anomalies. As a practical application, in the paper we show how to use the proposed framework for normalizing multimedia databases used in e-learning applications. In particular, we have applied our normal forms to transform the database schema in order to make it suitably accessible from different client devices, including PDAs and other mobile devices, each with different multimedia and bandwidth capabilities. Based on this example, we have provided some experimental data to show the impact of the normalization process on access performances. Moreover, we have performed a simulation on a large image dataset in the context of content-based retrieval applications, aiming to analyze the impact of the proposed framework on retrieval performances and errors.

The paper is organized as follows. In Section 2 we describe semantics of attributes in multimedia databases. In Section 3 we present the extended dependencies of our framework. In Section 4 we propose new normal forms based upon the extended dependencies. The e-learning case study is presented in Section 5, whereas the experiments for evaluating the framework are presented in Section 6. Finally, discussion is provided in Section 7.

2. Semantics of Multimedia Attributes

Before discussing dependencies and normal forms for multimedia databases we need to define the semantics of multimedia attributes.

A dependency is a property of the semantics of attributes. Database designers use their understanding of the semantics of attributes to specify functional dependencies among them [EN03]. As we now, other than alphanumeric data, multimedia databases can store complex data types, such as text, sound, image, and video. Prime versions of MMDBMSs provided a new data type for storing complex data, namely the binary large object (BLOB). This storage technique yielded some problems, because BLOBs are non-interpreted byte streams, hence the DBMS has no knowledge concerning their structure and the content of their components. It can only see them as atomic data types. Although inefficient, the simple structure of the BLOB data type makes it general enough to model most of the complex data types used in multimedia databases. Thus, a better solution should at least allow us to model and manipulate all the different complex data types that might be needed in multimedia computing. This is why some RDBMS vendors have aimed to develop extensible RDBMSs, that is, RDBMSs extended with new complex data types, together with added capabilities to let developers define and implement their own data types, and the corresponding manipulation functions.

Beyond the storage strategy used for the different media types, we define a framework to model their semantics so as to be able to derive functional dependencies between them. To this sake, we have exploited the framework of generalized icons [Cha96], which are dual objects (xm, xi), with a logical part xm, and a physical part xi. They can be used to describe multimedia objects such as images, sounds, texts, motions, and videos. A generalized icon for modeling images is like a traditional icon, whereas those for modeling sounds, texts, motions, and videos are earcons, ticons, micons, and vicons, respectively. For all of them we denote the logical part with xm, whereas the physical parts will be denoted with xi for icons, xe for earcons, xt for ticons, xs for micons, and xv for vicons. The logical part xm always describes semantics, whereas xi represents an image, xe a sound, xt a text, xs a motion, and xv a video. Furthermore, a multicon is a generalized icon representing composite multimedia objects [ACG97]. Thus, a multicon will also be a dual object, where the logical part represents the semantics of the combined media. Generalized icons can be combined by means of special icon operators. The latter are dual objects themselves, where the logical part is used to combine the logical parts xm of the operand icons, whereas the physical part is used to combine their physical parts xi. For instance, by applying a temporal operator to several icons and an earcon we might obtain a vicon, with the physical part representing a video, and the logical part describing the video semantics.

In our framework we associate a generalized icon to each complex attribute, using the logical part to describe its semantics, and the physical part to describe the physical appearance based upon a given storage strategy. The logical parts of generalized icons will have to be expressed through a semantic model. Conceptual graphs are an example of formal semantic model that can be used to describe logical parts of generalized icons [Cha96]. Alternatively, the designer can use frames, semantic networks, or visual CD forms [CPO94].

As an example, choosing a frame based representation, an image icon representing the face of a person may be described by a frame with attributes describing the name of the person, the colors of the picture, objects appearing in it, including their spatial relationships. A vicon will contain semantic attributes describing the images of the video photograms, the title of the video, the topic, the duration, temporal relationships, etc. Similarly, semantic attributes in an earcon might be used to describe the title of the sound, if any, the genre, the singer (in case it is a song), the sampling rate, etc.

Based on the specific domain of the multimedia database being constructed, the designer will have to specify the semantics of simple and complex attributes according to the chosen semantic model. Once s/he has accomplished this task, the generalized icons for the multimedia database are completely specified, which provides a semantic specification of the tuples in the database.

As an example, to describe semantics in a database of singers we might use attributes name, birthday, genre, as alphanumeric attributes, and picture as an icon representing the singer picture, one or more earcons to represent some of his/her songs, and one or more vicons to represent his/her video clips. A tuple in this database describes information about a specific singer, including his/her songs and video clips. This provides a complete semantic specification of the tuple.

3. Extended Dependencies

In traditional relational databases a functional dependency is defined as a constraint between two sets of attributes from the database [Cod72]. Given two sets of attributes X and , a functional dependency between them is denoted by X. The constraint says that, for any two tuples t1 and t2, if t1[X] = t2[X], then t1[Y] = t2[Y]. This concept cannot be immediately applied to multimedia databases, since we do not have similar simple and efficient methods to compare multimedia attributes. In other words, we need a method for defining equalities between groups of attributes involving complex data types.

Generally speaking, the matching of complex attributes needs to be an approximate match. In fact, the problem of comparing complex data for equality resembles the problem of retrieval from multimedia databases, where the query is entered by sketching or composing an object belonging to the same media type of the one being retrieved. Then, by using a specific indexing and retrieval technique [DCL97], it is possible to retrieve a set of objects from the database that are similar to the query objects. Thus, it is also possible to select the most similar object, since the objects in the answer set are ranked according to a certain degree of similarity.

We follow a similar criterion to define extended functional dependencies for multimedia databases. This means that we can extend the definition of dependency by selecting a specific similarity function and thresholds to perform approximate comparisons of complex data types. Thus, the functional dependencies change if we use different similarity functions. As a consequence, we enrich the notation used for functional dependencies to include symbols representing the chosen similarity function. Before this, in what follows we introduce some basic concepts of the relational model and of similarity theory [SJ99].

In the relational model the database is viewed as a set of relations of time-varying content. A multimedia database is formed by one or more relations of the form R(x1: X1,…, xn: Xn), where X1,…,Xn are data types, and xi is the name of the i-th field or column of the relation R. An instance of R, that is, its content at a given time is defined as a subset of the Cartesian product X1×…×Xn. This instance can be represented as a relation having as rows (named tuples) the elements of the subset of X1×…×Xn, and as columns the attributes of R. If R = {X1,…, Xn} is a database scheme, then we write attr(R) for i=l..n Xi. If t is a tuple of this relation (i.e., an element in an instance of R), then t[A] denotes the value of this tuple in the A-column; t[A] is called the A-value of r. A schema consists of a set of relations, where each relation is defined by its attribute sets, and some semantic constraints. In this paper we restrict our attention to constraints which can be expressed as multimedia functional dependencies (MFDs).

Tuples of a relationcan be compared by means of a set of relevant features . For instance, images can be compared using attributes like color, texture, shape, etc.; audio data can be compared using loudness, pitch, brightness, bandwidth, and harmonicity. The values of each feature F  belong to a domain D = dom(F).

The similarity between two elements x and y in a tuple is based on distance measures or, equivalently, on similarity functions, defined on feature spaces that we assume to be metric spaces. In the following, we will always refer to metric distance functions, but it should be understood that the same considerations apply to similarity function, given the symmetry between distance and similarity functions [San01].

In particular, given two elements x and y belonging to a given data type X, we consider distance functions of type d: D2 → [0,1], such that for each vx, vy D d(vx, vy) returns the distance of x from y with respect to the feature space D. In what follows we indicate such distance with d(x, y) for any two elements x, y: X. Moreover, given a data type X, we denote with D(X) the set of distance functions defined on X, and with Xd the feature space on which the distance function d is defined.

In order to evaluate the similarity between the multimedia objects of two tuples we introduce a tuple distance function, which summarizes the results produced by the different distance functions applied to the elements of the tuples. In particular, given a relation R(z1:Z1,…, zn:Zn), if x=(x1,…, xn) and y=(y1,…, yn) are two tuples of R, then (x, y) = g(d1(x1, y1),…, dn(xn, yn)) measures the distance between x and y, where diD(Zi) and
g: [0,1]n→[0,1] is a monotone aggregation function that combines the n scores to derive an overall score. Aggregation functions should satisfy the triangular co-norm (t-conorm) properties, that is, the zero identity, monotonicity, commutativity, and associativity properties. There are several t-conorm aggregation functions defined in fuzzy logic literature [Fag96,Zad65], among which the max function is the most commonly used. Notice that if n=1 then (x, y)=d1(x, y). Given a set of data types X, we denote with TD(X) the set of tuple distance functions defined on X.

Definition 3.1. Let R(z1:Z1,…, zn:Zn),  be a tuple distance function on R, t be a maximum distance threshold, x=(x1,…, xn) and y=(y1,…,yn) be two tuples in R, we say that xis similar within t to ywith respect to , denoted with x(t)y, iff (x, y)t.

Now we are ready to introduce the notion of functional dependency for multimedia databases, named type-M functional dependency.

Definition 3.2 (type-M functional dependency)

Let R be a relation with attribute set U, and X, Y  U. Xg1(t’) → Yg2(t’’) is a type-Mfunctionaldependency (MFD) relation if and only if for any two tuples t1 and t2 in R that have t1[X] g1(t’) t2[X], then t1[Y] g2(t’’) t2[Y], where g1TD(X) and g2TD(Y), whereas t’ and t’’ [0,1] are thresholds.

This means that the features used by g2 on Y depend on the features used by g1 on X; or, alternatively, the values of the features used by g1 on X component imply the range of values for the features used by g2 on Y component. Notice that given a distance function d1 and a threshold t, x1d1(t) x2 and x2d1(t) x3 does not imply x1d1(t) x3, as shown in Fig. 3.1. However, we can state that x1d1(2t) x3. In general, if Xd1(t’) → Yd2(t’’) holds then for any two tuples t1 and t2 that have t1[X] d1(kt’) t2[X], then t1[Y] d2(kt’’) t2[Y], with k.

Thus, given an element x belonging to a data type X, the equivalent class of x respect to a distance function dD(X) and a threshold t[0,1] is defined as [x]d(t)= {yX | yd(t)x}. Similarly, for a tuple x=(x1,…, xn) of a relation R(z1:Z1,…, zn:Zn), the equivalent class of x respect to a tuple distance function  is defined as [x](t) = {(y1,…, yn) | yidi(ti)xi, with yiZi, diD(Zi), ti[0,1] for 1 in and (x,y)= g(d1(x1,y1),…, dn(xn,yn))}.