FEDERATED DATABASE MANAGEMENT SYSTEM

A collection of databases that are treated as one entity and viewed through a single user interface.

WHAT IS A FEDERATED DATABASE

A federated database system is a type of meta-database management system (DBMS) which transparently integrates multiple autonomous database systems into a single federated database. The constituent databases are interconnected via a computer network, and may be geographically decentralized. Since the constituent database systems remain autonomous, a federated database system is a contrastable alternative to the task of merging together several disparate databases. A federated database (or virtual database) is the fully-integrated, logical composite of all constituent databases in a federated database system.

Among other surveys defines Federated Databases as a collection of cooperating component systems which are autonomous and are possibly heterogeneous. The three important components of an FDBS as pointed out in is autonomy, heterogeneity and distribution.

FDBS Architecture:-A DBMS can be classified as either centralized or distributed. A centralized system manages a single database while distributed manages multiple databases. A component DBS in a DBMS may be centralized or distributed. A multiple DBS (MDBS) can be classified into two types depending on the autonomy of the component DBS as federated and nonfederated. A nonfederated database system is an integration of component DBMS that are not autonomous. A federated database system consists of component DBS that are autonomous yet participate in a federation to allow partial and controlled sharing of their data.Federated architectures differ based on levels of integration with the component database systems and the extent of services offered by the federation.

A FDBS can be categorized as loosely or tightly coupled systems:-

Loosely Coupled require component databases to construct their own federated schema. A user will typically access other component database systems by using a multidatabase language but this removes any levels of location transparency, forcing the user to have direct knowledge of the federated schema.

Tightly coupled system consists of component systems that use independent processes to construct and publicize an integrated federated schema.

Distribution of data in an FDBS is due to the existence of a multiple DBS before an FDBS is built. Data can be distributed among multiple DB which could be stored in a single computer or multiple computers. These computers could be geographically located in different places but interconnected by a network. The benefits of data distribution help in increased availability and reliability as well as improved access times.

Fundamental to the difference between an MDBS and an FDBS is the concept of autonomy. It is important to understand the aspects of autonomy for component databases and how they can be addressed when a component DBS participates in an FDBS. There are four kinds of autonomies addressed.

WHY ARE FEDERATED DATABASES BECOMING INCREASINGLY COMMON?

A federated database may be composed of a heterogeneous collection of databases, in which case it lets applications look at data in a more unified way without having to duplicate it across databases or make multiple queries and manually combine the results.

In a homogeneous environment, federated databases can help distribute the load of very large databases (VLDBs). In this configuration, each component database has an identical schema but only a subset of the total rows. The federated database system distributes queries to the appropriate component database; the goal of the system is to ensure that a typical query will need to use only one component, thus drastically reducing the number of rows that need to be searched. Microsoft SQL Server has supported this type of database federation since its 2000 edition.

When a federated database is used for load distribution, rows are distributed to its components based on a primary key. Picking this key isn't trivial -- it can make the difference between a successful configuration and an unsuccessful one. Ideally, most orall queries should end up hitting only one component database.

For instance, a bank may use a federated database in which transactions are split by year. Users will often only look at transactions in the past year and the system will only need to touch one or two component databases. On the other hand, splitting the databases by customer ID isn't likely to work well; a given set of transactions will involve a random distribution of customer IDs, meaning that the query will be sent out to many, or potentially all, of the component databases. This eliminates the benefit of the federated database -- nearly all of the rows end up being searched -- and will only increase the query's overall latency because of the query redirects.

Through data abstraction, federated database systems can provide a uniform front-enduser interface, enabling users and clients to store and retrieve data in multiple noncontiguous databases with a single query--even if the constituent databases are heterogeneous. To this end, a federated database system must be able to decompose the query into subqueries for submission to the relevant constituent DBMS's, after which the system must composite the result sets of the subqueries. Because various database management systems employ different query languages, federated database systems can apply wrappers to the subqueries to translate them into the appropriate query languages.

Some key features of federated systems include:

  • Autonomous data sources
  • Heterogeneity of data sources
  • Data sources often geographically distributed
  • Data sources controlled by independent administrative domains
  • Logical integration of distributed datasets
  • Coherent, unified, and integrated view of data from multiple resources
  • Largely vendor neutral

So these database systems are used to handle both homogeneous and heterogeneous databases and provide a uniform front end user interface.Also because of the low costs associated with building a system, these databases become commonly increasing.

EXAMPLES OF DATABASES IN A WORK ENVIRONMENT THAT COULD BE FEDERATED

In a large modern enterprise, it is almost inevitable that different portions of the organization will use different database management systems to store and search their critical data. Competition, evolving technology, mergers, acquisitions, geographic distribution, and the inevitable decentralization of growth all contribute to this diversity. Yet it is only by combining the information from these systems that the enterprise can realize the full value of the data they contain.

For example, in the finance industry, mergers are an almost commonplace occurrence. The newly created entity inherits the data stores of the original institutions. Many of those stores will be relational database management systems, but often from different manufacturers; for instance, one company may have used primarily Sybase, and another Informix IDS. They may both have one or more document management systems -- such as Documentum or IBM Content Manager -- for storing text documents such as copies of loans, etc. Each may have applications that compute important information (for example, the risk of a loan to a given customer), or mine for information about customers' buying patterns.

After the merger, they need to be able to access all customer information from both sets of stores, analyze their new portfolios using existing and new applications, and, in general, use the combined resources of both institutions through a common interface. They need to be able to identify common customers and consolidate their accounts, although the different companies may have referred to their customers using totally different identifying keys. Federation technologies can significantly ease the pain in these situations by providing a unified interface to diverse data.

IBM has made significant investments in federation technologies that have resulted in market leading capabilities across the Data Management product portfolio. Today, federation capabilities enable unified access to any digital information, in any format -- structured and unstructured, in any information store. Federation capabilities are available today through a variety of IBM products including DB2 UDB, DB2 DataJoiner, and IBM Enterprise Information Portal (EIP). This set of federation technologies continues to be enhanced and our customers' investments in all of these products continue to deliver real business value.

HOW DO THE AUTHORS PROPOSE THAT METADATA BE MANAGED FOR A FEDERATED DATABASE

Metadata is "data about data", of any sort in any media. Metadata is text, voice, or image that describes what the audience wants or needs to see or experience.

The metadata of the database should be cataloged in a way such that it is replicated frequently, so that all systems were similar, and that no one federated cell on the system was more valuable than another. Doing this ensures that all cells are the same, and the system integrity remains accurate.

[A federated database system is a collection of autonomous database system, which cooperates to provide a combined view of individual data stores.One of the problems facing federated database management systems is how to handle change among entities in the federation. Change can occur when a new site is added to the federation, a new set of security privileges is introduced, or one of the schemas has been modified. One powerful mechanism for constructing an adaptive federated architecture is metadata.]