Lecture Twelve – Other Data Models

Readings:
  • Required: Connolly and Begg, sections 23.1 and 23.4.
  • Optional: Connolly and Begg, sections 23.2, 23.3, and 23.6.

It has been shown that the relational data model is very powerful when applied to traditional business database applications. Besides its simplicity, the concept of a table has offered just the right representation for business data. However, Relational data models have certain limitations when it comes to data whose complex structure is too difficult or cumbersome to be represented using tables.

These applications involve data whose complex structure is too difficult or cumbersome to be represented using "flat" tables, or 2-D table.

Problem:

(1) Multi-Valued Attributes

When mapping the composite and multi-valued attributes in an E-R schema to a relational schema, we had to break the composite attributes into their simple, atomic components and to map the multi-valued ones on a separate table.

As a result, the information about the structure of the attributes was lost as the data was scattered over many relations.

Furthermore, recursive queries cannot be expressed with a single SQL SELECT statement.

For example, try to express the query "Find all librarians who have grandchildren" using the LIBRARIAN and DEPENDENT tables.

(2) IsA Hierarchy

Given that we cannot directly express hierarchies in the relational model, we cannot use inheritance to simplify queries.

For example, we cannot define the LIFE_MEMBER table as

LIFE_MEMBER (MemNo, Year)

and then use the query

SELECT Fname, MI, Lname FROM LIFE_MEMBER;

to retrieve the names of all life members without specifying a join with the MEMBER table.

(3) Binary Large Objects and Character Large Objects

In general, it is not capable of supporting binary large objects (blobs) such as video clips with sizes in the hundreds of Megabytes.

And there is no way to query it using a SELECT statement, e.g. we cannot retrieve frames 13 and 17 from a video. This is also a problem to clobs.

Solution: User-Defined Data Types and Objects

The problems of data with complex structures and of data that need to be manipulated in a specific way can both be solved by encapsulating them as abstract data types (ADTs).

An ADT consists of

  • A specification of its visible data structures and permissible operations,
  • An implementation of its state, its relationships with other ADTs, and its operations.

If an ADT is used to define the domain of a column, then the operations defined on the ADT can be used in queries and transactions to manipulate the values of the column.

Furthermore, an ADT can be thought of as the type of an object; or in other words, an object can be thought of as an instance of an abstract data type. In such a case, the idea of inheritance of state and operations from object-oriented programming languages can be applied to objects and ADTs in databases.

The introduction of ADTs and objects to support advanced database application has taken several forms, two of which—object-relational databases and object-oriented databases—are in the process of standardization.

The first approach extends the relational model with ADTs and objects and is expected to be standardized by SQL3, the next version of SQL.

The second approach extends an object-oriented language, typically C++, with database concepts. An object-oriented data model standard is proposed by the Object Database Management Group (ODMG) and the Object Management Group (OMG), and this initiative pushes for a standard object-oriented client-server model called CORBA (Common Object Requester Broker Architecture).

Object-Relational Databases and SQL3

The Object-Relational (OR) model extends the relational model with object-oriented features, while maintaining the declarative access to data. The SQL3 is being designed to standardize the support for ADTs and objects, while remaining backward compatible with SQL2.

The new object-oriented features of SQL3 include:

  • Object identifiers and reference types.
  • Complex types and non-1NF representation (nested relations).
  • Inheritance of type and table.
  • Complex queries and functions.

SQL3 also attempts to standardize stored procedures, multimedia extensions, transaction-definition language, and an application programming interface similar to Microsoft's ODBC.

Collection Types

The OR model allows the definition of nested relations that have multi-valued columns. A multi-valued column is declared, using one of the collection type constructors:

  • ARRAY: a fixed-size one-dimensional array
  • SET: an unordered collection without duplicates
  • LIST: an ordered collection with duplicates
  • MULTISET: an unordered collection with duplicates

Example:

For example:

SELECT title, revision

FROM DOCUMENT

WHERE title in SET ('DBS', 'DDBS', 'MDBS');

User-Defined Types

Users can introduce new types using the CREATE TYPE command. For example, we can use the following two statements to define the distinct type String and the structured type MyDate.

CREATE TYPE String AS VARCHAR(20);

CREATE TYPE MyDate (

day integer,

month char(3),

year integer );

User-defined types can be used in the definition of other types and the definition of columns in a CREATE TABLE statement, as we will see below.

Row Types

A special structured type is the row type that can represent the types of rows in a table. By defining the rows of a table as types, we can assign them to variables and use them as arguments in functions. Let us define again the DOCUMENT table, based on a row type.

CREATE ROW TYPE Doc_row (

title String,

revision MyDate,

keyword SET(String),

authors LIST (String) );

CREATE TABLE Document OF TYPE Doc_row (

PRIMARY KEY (title));

Type Inheritance

The OR model supports both single inheritance and multiple inheritance. Inheritance can be specified using the keyword UNDER.

Example:

CREATE TYPE Person ( name String, HKid integer);

CREATE TYPE Student (degree String, dept String) UNDER Person;

CREATE TYPE Teacher (salary integer, dept String) UNDER Person;

As an example of multiple inheritance, consider teaching assistants (TAs) who are both students and teachers.

CREATE TYPE TA

UNDER Student WITH (dept as student-dept),

Teacher WITH (dept as teacher-dept);

We use the WITH clause in order to rename and hence disambiguate common names.

It should be pointed out that an entity can have only one type, the most specific one. For example, an entity cannot be a TA and a Teacher at the same time.

Table Inheritance

As opposed to type hierarchy, table hierarchy allows an object to belong to multiple tables.

Single inheritance example.

CREATE TABLE Person (name String, HKid integer);

CREATE TABLE Student (degree String, dept String) UNDER Person;

CREATE TABLE Teacher (salary integer, dept String) UNDER Person;

Note that every subtable inherits every column of its super-table. Given the above schema, this means that a row corresponding to a teaching assistant belongs to both Student and Teacher tables.

A constraint on single inheritance is that each row in a super-table (Person) can correspond to one tuple in a subtable (Student and Teacher) and vice-versa. INSERT, DELETE, and UPDATE operations are appropriately propagated.

A multiple inheritance example:

CREATE TABLE TA

UNDER Student WITH (dept as student-dept),

Teacher WITH (dept as teacher-dept);

An entity in TA table is also present in tables Teacher, Student, and Person.

Object IDs and Reference Types

Recall that keys are properties of tables and can be used to uniquely identify a row only within a given table. When a row is assigned to variables or passed in and out of a function, the key is meaningless. On the other hand, object IDs or OIDare system unique identifiers that can be used to reference a row (or an object) in any context.

In OR model, rows can be associated with OIDs using the WITH OID option,

CREATE ROW TYPE people_row (

Name String,

Birthday DATE )

WITH OID;

CREATE TABLE Person of TYPE people_row;

and referenced using the keyword REF:

CREATE TABLE Document1 (

title String,

revision DATE,

keyword SET(String),

authors LIST(REF(Person)),

PRIMARY KEY (title) );

In the above example, we reference the table in authors. Alternatively, we could reference the row type. If several tables of the same row type exist, we can restrict a reference to a specific table using a SCOPE FOR clause.

As an example let us rewrite the above CREATE TABLE statement

CREATE TABLE Document1 (

title String,

revision DATE,

keyword SET(String),

authors LIST(REF(people_row)),

SCOPE FOR authors IS Person,

PRIMARY KEY (title) );

Querying with Complex Types

We can refer to the components of components of a row by building a path expression using the double dot notation. For example, consider the query that lists the titles and the authors' names of all documents associated with the keyword database. The path expression here is Document1.authors..Name.

SELECT Document1.title, Document1.authors..Name

FROM Document1

WHERE'database' in keyword;

Path expressions can also be expressed using the dereferencing operator (->). Dereferencing works in the same way as in C.

SELECT Document.title, Document1.authors->Name

FROM Document1

WHERE 'database' in keyword;

A collection type, such as the result of a path expression, can appear in the FROM clause. For example the above query can be rewritten using the collection type B.authors in the FROM clause as follows:

SELECT B.title, Y.name

FROM Document1 as B, B.authors as Y

WHERE 'database' in keyword;

User-Defined Functions

A user-defined function can be written in either SQL or in a high-level programming language like C or C++. In the latter case, the function is compiled externally but is loaded and executed within the DBMS.

The general structure of user-defined functions is

CREATE FUNCTION () RETURNS ;

As an example of a function in SQL, consider the function that counts the authors of a document.

CREATE FUNCTION author-count (ADOC Document)

RETURNS integer AS

SELECT COUNT(authors) FROM ADOC;

This function can be used in any other query. For example,

SELECT title

FROM Document d

WHERE author-count(d) > 1;

Inserting Complex Values

DELETE and UPDATE commands in the OR model are the same as in the traditional relational model. Inserts are also very similar, but they require using the SET type constructor when inserting multi-valued attributes, such as authors and and keywords.

INSERT INTO Document (title,revision,authors,keyword)

VALUES ('DBS', (29,'Feb',2000), SET('Suri','Smith'),

SET('Data Models','Databases', 'Transactions'));

Varying Data Types

Multimedia Datatypes

They include images, such as pictures and graphs, video clips, such as movies, audio clips, such as songs and phone messages, and documents, such as books and HTML pages.

Two main concerns:

  1. How to store and access it efficiently,
  2. How to perform content-based retrieval.

E.g., locate all video clips that include a certain building, or a query might involve an activity, such as the photo finish of a car race.

In order to store multimedia data efficiently, DBMSs need to store them in a compressed form. The compression techniques used not need to be fast recall and play back of the multimedia source.

In order to support content-based retrieval, a model is needed that can organize and index multimedia data based on their contents. Identifying the content of a multimedia source is not easy. Certain type-specific mathematical characteristics can be extracted automatically. For example, in an audio clip, loudness, intensity, pitch, and clarity are characteristics that can be automatically extracted. Other features require human intervention. For example, the identification of a sound that is pleasing to the ear.

Time-Series Data

Time-series data represents a special collection type.

  • It is a set of values.
  • Each value is recorded at a predefined point in time.

Example, consider the closing daily stock prices of a particular company at the New York Stock Exchange.

Typical queries on time series include temporal aggregation over different temporal intervals. For example, find the average weekly closing stock price or the maximum monthly closing stock price from the daily time series.

Another example is the query that compares this year's maximum monthly closing stock price for a specific stock with that of the previous year. A more complex query is the one that, given a collection of monthly stock price movements, finds a time series that is similar to one specified, or to one that has specific characteristics.

Spatial Data

Spatial data are objects in a multi-dimensional space. Geographical coordinates, for example, namely latitude and longitude, are two-dimensional spatial descriptors; On the other hand, meteorological databases for weather information are three-dimensional because they also require the distance from the earth.

To store and query spatial data, spatial data models are needed. For example, in two-dimensional geometric space, we need to interpret points, lines, line segments, circles, polygons, and arcs.

Spatial characteristics can be static, such as the location of a building, or dynamic, such as a moving car.

Furthermore, in order to perform queries, we need operations that manipulate the spatial characteristics.

For example, we need operators that compute the distance between two objects in a range query(e.g., find all the gas stations within 5 miles), or that determine which objects are the closest objects from a given point in a nearest neighbor query (e.g., find the closest gas station).

Database Classifications and the Marketplace

Readings:
  • Elective: Connolly and Begg, appendices C, D, and E.

Network, Hierarchical (IBM-IMS), Relational, OO

(Prepared by BR-Cheung)12-1/21