Paul Murray, Claire Mcquade, Kashif Rafiq, David Miller

Federated Databases

Coursework

CO42009: Object Database Systems

Paul Murray, Claire McQuade, Kashif Rafiq, David Miller

May 2002

Introduction

Aims

Characteristics

Autonomy

Distribution

Heterogeneity

Characteristics of a FDBS

Architecture

Processors

Transforming Processor

Filtering Processor

Construction Processor

Accessing Processor

Schema

Schema Components

Local Schema

Component Schema

Export Schema

Federated Schema

External Schema

Customisation

Integrity Constraints

Access Control

Seven Layer Model

Schema Evolution

Coupling

Loosely coupled FDBS

Tightly Coupled FDBS

Views

OO Systems

Selection

Projection

Join

Relational Database Systems

Conclusion

References

Introduction

Given today’s heterogeneous environments and the need to share information between many different entities, a new area of database technology is being developed. Database Federation is concerned with the integration of two or more database systems to form a single view for an application or user.

Given that a Federation is a loosely coupled network of independent entities, each local database administrator maintains control over his own system, and shares the data and design with the rest of the federation. This local knowledge is then aggregated into a large schema or view that is used by applications or the end user. A local database can also take part in one or more federations. The local database can also fulfil its local operations while it is part of the federation.

The integration part of the DMS may be managed by the users of the federation, or by the local database administrators. The amount of integration depends on the needs of the federation.

Given the ability to export some or their entire schema, any kind of database could be part of a federation. Relational databases are prime candidates for inclusion in a federation. Their design has been around for many years and is used widely across many areas to store regular, ordered data. Almost all relational databases use SQL as their data manipulation language. Again this language is mature and is seen as computationally complete.

Another species of database that is beginning to emerge is the object database. Although Object Orientated Databases are still in their infancy, they have several advantages over the relation model, in addition to some weaknesses. In an object system classes are used to provide the means to store and process data. Each class can be thought of as a table, and each instance of a class a record. Built into the class are attributes (for storing values) and methods (for manipulating data). By accessing the methods defined in the class the database effectively becomes the application.

Objects also benefit from object orientation, as methods can be overloaded and redefined using inheritance and specialisation.

By using these object-orientated features, more complex real world object can be represented using an object system. These are the advantages of an object-orientated database, but there are some drawbacks.

Object Orientated databases are young and immature. They are at the same stage that relational databases were at 10 years ago. There are still no full query languages (such as SQL for relational) and many vendors have stopped at a half way house by adding some object features to relation databases (object relational).

Whilst the concepts of object databases is hard to understand, their usefulness in modelling real world problems sets them apart from relational databases.

Many object systems hold data about biological, or other scientific data. Academic research has become a global effort with institutions cooperating in a way never seen before. If each institution has some data, why not create a federation to share it? The federation might contain relational databases and or object orientated systems.

Aims

This report will concentrate on several key areas

Characteristics
Architecture
Processors
Schema
Coupling
Views

As a federation implies one or more databases, the data must be distributed over each entity in the federation. Some entities may have more data than others. In relation systems, the data could be partitioned horizontally or vertically. In an object system, the classes can also be fragmented. Horizontal fragmentation will group by extent, while vertical fragmentation will group by instances.

Views are very important to Federated Database Systems (FDBS). They allow data to be selected and projected from each entity to another layer in the schema where it is combined into a larger view. All the application and users that use the federation will connect to this layer of the schema to gather data.

Schema describes the design of the whole system. There will be schema that describes the local database (relational or object), any middleware that projects the local database to the federation will have to reference schema for data types. Often there is a global schema that describes the overall federation as well.

Characteristics

Autonomy

As each member of the Federation is a local Database Management System (DBMS), their characteristics are fundamental to the systems that are built on top of them. A local DBMS will typically host several databases on one machine. These databases could hold anything from accounting information to taxonomy data. The DBMS is the software that manages the database. It provides such features as

Transaction Control
Access Control
Query Processing

Each local DBMS has autonomy. One database administrator will control the day to day running of the DBMS. However in a Federation there has to be some distribution of control that defines how independently the local DBMS can operate. This will govern features like design, communication and execution in the Federation.

Distribution

Due to the nature of a federated database, data will be spread over multiple databases. These databases may be on the same computer, or on geographically distant systems. The only requirement of a FDS is that the systems can talk to each other over a network.

There are several advantages to data distribution. These include improved access times, availability and reliability. In a distributed database system the data may deliberately be distributed, but in a federated system data is generally distributed already and is governed by what data each local node already has, although this can be altered.

Heterogeneity

It will be likely that in a Federation there will be more than one type of Operating System and database combination, there must be some kind of interoperability in the system. Each local DBMS will have different ways of representing data models, semantics and system operations.

When looking at data models such things as structures and query languages must be taken into account. Every programming language represents structures differently and query languages will be different for both relational systems and also object systems. There are even differences between SQL implementations on the same operating system (T-SQL – Microsoft and PSQL – Oracle).

Semantics describe what something is. If two systems that are to be in a federation refer to a customer in different ways (one might use a customer number, the other a shortened version of the customer name), then some problems might be encountered. Although they both refer to a customer and both uniquely identify the customer, they are in fact two different attributes of a customer. If there are difficulties in the interpretation of attribute names or other vital information then there will be problems.

Using different operating systems and DBMS will also present problems. Every database system will handle core functionality differently. If grouping DBMS into one logical group, basic tasks like transaction handling and recovery could be tricky. In addition not all features might be available to the database, depending on the operating system it sits on top of.

Characteristics of a FDBS

The FDBS is a collection of cooperating but autonomous database management systems. Each of the DBMS participates to a varying degree in the Federation. The FDMS provides the software the gives application access to the underlying DBMS in a secure, controlled manner. There is no centralized control of a FDMS and all operations involve either the local DBMS or the whole federation.

The following diagram shows how different types of local databases can be grouped together in a federation.

Architecture

The FDBS is made up of many components like

Data
Local DBMS
Processors
Schemas

The first two points are self-explanatory. A database is useless without data, and there must be some software available to manage requests to the database and perform tasks like transaction handling.

The processors are application independent programs that will perform tasks like data conversion between systems.

Schemas are the designs of each DBMS and the FDBS. These contain details of each structure in a database, all its attributes and error checking.

Processors

Processors are perhaps the most important extension to the local DBMS. These programs allow such functions as data transformation, data filtering, data construction and data access. Example diagrams are shown after each processor description.

Transforming Processor

The transforming processor works to transform data from one language to another, translate data from one format to another and provides some kind of data independence as it helps hide differences in query languages and data formats.

Filtering Processor

Typically the filtering processor will constrain the commands and data that can be passed from one processor to another. The syntax and semantics of each command and data is checked for validity before being passed on. This reduces the risk of data corruption if data is passed unmodified to another process. If data or commands require to be modified, they are passed to a transforming processor.

Construction Processor

The construction processors enable the aggregation of data from many sources into one. They support location, distribution and replication operations. Construction processors also provide schema integration between local DBMS. This entails negotiation, command optimisation and decomposition and transactional management over a FDBS.

Accessing Processor

The accessing processors accept commands from other processors, execute them against a local DBMS and return data to the other processors.

Schema

While the three level schema architecture is adequate for a centralised DBMS it is not really an adequate architecture for FDBS. To support the system distribution, heterogeneity and autonomy associated with FDBS it is necessary to extend the three level schemas. There are various levels of extension used such as the four level, five or even seven level schema architectures, however the happiest medium seems to be the five level schemas.

The following diagram Fig. S1 shows the layout of the five level schemas.

Fig. S1

The next diagram Fig. S2 is of a system architecture consisting of processor and schema of an FDBS.

Fig. S2

Schema Components

Local Schema

The local schema is the conceptual schema of a component DBS. It is expressed in the native data model of the component DBMS and so different local schemas can be expressed in different data models.

Component Schema

A component schema is made by translating the local schemas into a data model called the canonical or common data model (CDM) of the FDBS. There are two reasons for defining the component schemas in a common data model. The first reason is each design feature will now be described in a common way over all component schemas. The second reason is that semantics that are missing in local schema can be added to its component schema. When developing a tightly coupled FDBS this facilitates negotiation and integration tasks performed during development. With loosely coupled FDBS they facilitate negotiation and the specification of views and multi-database queries.

Export Schema

The export schema represents a subset of a component schema that is available to the FDBS. It may include access control information regarding its use by other federation users, not all of the data components may be available to the federation. A filtering processor can be used to provide the access control as specified in an export schema by limiting the allowable operations that can be submitted on the corresponding component schema. The export schema and such filtering processes support the autonomy of an FDBS.

Federated Schema

A federated schema is an integration of multiple export schemas. A federated schema includes the data distribution information that is generated when integrating export schemas, although some systems use a separate schema called a distribution or allocation schema to store this information. A constructing processor transforms commands on the federated schema into commands on one or more export schemas. The constructing processors and federated schemas support the distribution feature of the FDBS. There can be multiple federated schemas in an FDBS, one for each class of federation users. A class of federation users is a group of users and/or applications performing a related set of activities. A simple example might be represented in a corporate type environment, managers may be one class of federation users and all employees and applications associated with the accounting department may be another class.

Similar concepts to the federated schema exist and are represented by terms such as import schema, global schema, global conceptual schema, unified schema, and enterprise schema. It should be noted however that such terms other than import schemas tend to be used only when there is only one of this type of schema in the system.

External Schema

An external schema defines a schema for a user and/or application or class of users/applications. There are several reasons for using external schemas and are as follow.

Customisation

Federated schema are sometimes difficult to change, an external schema can be used to specify a subset of information in a federated schema that is relevant to users of the external schema. This allows changes to be made more readily to meet users’ needs than would be normally possible with a federated schema. The data model for the external schema may be different than that of the federated schema.

Integrity Constraints

The external schema allows you to put in place additional integrity constraints on the system.

Access Control

The export schema provides access control with respect to the data managed by the component databases; they also provide access control with respect to the data managed by the FDBS. A filtering process analyses the commands on an external schema to make sure that they conform to the access control and integrity constraints of the federated schema. A transform process is used if the export schema has a different data model from the federated schema to transform the commands on the external schema to those on the federated schema

Seven Layer Model

While the most popular design model for database federations is the five layer model there have been some extensions to this. For some recent healthcare projects two extra layers were added to the design. This was due to the extra levels of security that were required. One of the layers dealt with schema authorisation. These schemas were subsets of federated schema, but only contained subsets of data.

Schema Evolution

One of the problems found with federated databases is the cumulative schema evolution rate. A client application cannot be tied to a specific integrated view of the federation, as all it takes is one member of the federation to change its schema without ensuring backward compatibility to break the client. One way for clients to retain schema independence is by requesting a copy of the schema from an object broker (OB) when necessary. If this is done the back-end schema can change without affecting the operation of the front-end views.

To support this dynamic schema exchange, the data sent between the client and OB cannot be compile-time defined. To support it a suite of generic object structures for building objects on the fly is needed at both ends. These would include abstractions for primitive types, extensible arrays, and extensible associative arrays. The schema data is also captured using these structures and validation tools are required that can check an instance data structure for schema compliance.

A common OB system is CORBA. It is maintained by the same organisation as the ODMG standards.

Coupling

Coupling describes the amount of autonomy that each local system has within a federation. There are two types of coupling, loose and tightly coupled. Each defines how much control the local administrator has over the schema that is exported from their database.

Loosely coupled FDBS

A loosely coupled system provides an interface to deal with multiple component DBMS directly. A loosely coupled approach may be better suited for integrating a large number of very autonomous read only databases. The User has to manage the federated schemas themselves, thus the FDBS can do very little to optimise queries, but the user is free to specify their own queries to achieve good performance. This implies that the user must have a good understanding of the components DBMS. A user of a loosely coupled FDBS has to be sophisticated to be able to find appropriate export schemas that can provide the required data and to define mappings between their federated schemas and export schemas. Lack of adequate semantics in the component schemas can make this task particularly difficult.