Information Integration Using The Blackboard Technique

D. Chinthamalla, H. Muthyala, W. D. Potter

Department of Computer Science

University of Georgia, Athens, GA

{deepak, muthyala, potter}@cs.uga.edu

Abstract - It can be safely said that today's world is driven by information. There is a lot of information available today in various forms and sources, including databases, knowledge bases, flat file systems and the world wide web. The challenging task is to integrate the information available in these different formats. In this paper, we propose an architecture for the integration of information that is available from such varied sources, at the heart of which lies a problem-solving technique of AI, the Blackboard technique. We model our query controller as a blackboard and provide the details involved. We also provide a brief overview of some of the other architectures and systems that integrate information.

1. Introduction

Today’s world is highly information-intensive. Information is available to the user in various forms of sources, including databases, knowledge bases, and the World Wide Web. But the problem arises if the data from these myriad sources needs to be used together. Since each of the disparate sources uses different data models and technologies, it becomes difficult to view the information available in each of them in a uniform manner. It is becoming more and more important to integrate the information from various information sources in order to make the information more useful and effective.
An Intelligent Information System (IIS) is a system that intelligently integrates the information from various underlying sources, each of which might be using different data models and query languages. The most important factor, though, is that the integration of information from different sources should be transparent to the user. What this means is that the user should not be forced to know how and when each of the background sources is used. The user should be able to query the system as a whole and not have to do so to each of the data sources. This also frees the user from having to know the intricacies of each and all of the languages that are needed to access the individual data sources.
There are various issues involved in accessing information, in a unified manner, from different sources.
Firstly, since each of the sources might be putting into / use a different data model, the information may need to be translated into a common data schema in order to be integrated with other data. Also, the information sources may use different technologies, which would differentiate the way in which data is retrieved from each of these. Another important issue would be the presentation and accessing of the various underlying sources to the system – should all the sources be visible to the user or transparent, for example. There are many such issues involved, in the handling of which various architectures differ. Some of the IIS architectures are discussed in the following section.
2. Various IIS Architectures
There are many existing systems today that provide intelligent information processing. Some architectures and technologies that are used for building these systems are described as follows and many of the existing systems can be a combination or a specific type of architectures and technologies described below.

2.1 Federated Database Systems

A federated database can be thought of as a collection of several other information sources, wherein there are two levels of operation – the individual components, which can be other databases or any structured sources, are still controlled by the local administrators, but each of them also contributes to a global view. Federated databases are of two types [12]: tightly coupled and loosely coupled. The difference between the two is that dynamic integration of the data from the various individual sources is possible in loosely coupled systems, whereas only a static and pre-defined view of the data is available for querying in tightly coupled systems. That is, in a tightly-coupled FDBS, all the integration of that data is performed in advance and only a final global view that can be queried is presented to the user and in a loosely-coupled system, the user can interact with the system and then evolve a customized schema that can be queried.

2.2 Mediator-Wrapper Architecture

There are two main components in the Mediator-Wrapper architecture – Mediators and Wrappers. Wrappers are source-specific components that translate
the system query into the native language of the source and also transform the results returned by the source into a format understandable by the integration system. A wrapper is built for each different type of source. Mediators are the intelligent components that integrate and/or refine the data coming from the wrappers, or other mediators. A mediator may deal with one or more wrappers and/or mediators. It translates a query on the system to queries for the individual wrappers/mediators. The mediator also has knowledge on how the data should be combined, if necessary. It then combines the results obtained from the wrappers/mediators into an answer for the user query.
Some example systems that employ the mediator-wrapper architecture are TSIMMIS [6], COIN [4], KOMET [5], InterDB [22] and DISCO [23].
2.3 Ontology-based Architecture
For a program/user to make statements or to ask queries about a domain, it becomes necessary to use a conceptualization of that particular domain. This domain conceptualization is nothing but a description of the entities that are part of the domain and some relationships that might exist between them. Therefore, it lays a platform for representing the domain and to make its knowledge available to others by providing a common vocabulary. Ontology can be said to be a specification of such a domain conceptualization.
Ontologies act as an interface between the system and the user by providing knowledge about the vocabulary for representing the domain and also the descriptions of these vocabulary terms. These terms can then be used by the developer or the user to talk to the system. And also the ontologies can be instrumental in human understanding and interaction about a particular domain. Ontology-based systems use these ontologies to help them integrate the information from the various sources. OBSERVER [18] is an example of a typical ontology-based system.
2.4 Agent-based Architecture
There are many variations to this architecture. In one variation, each of the agents functions as a broker. Each agent involved in the information processing broadcasts some information about itself to all the other agents. This includes information about what kind of data it contains and what kind of queries it can answer. When the Query Processor receives a request or a query, it forwards it to the agent that has the most capability of answering the request. The agent then computes the query and returns the result to the processor.
In another variation, there is exactly one agent that functions as a broker. This broker accepts the query details from the user. This information is passed on to a / service-providing agent that obtains the necessary information from the other agents. This information is then evaluated and the relevant results are communicated back to the Broker agent, who passes them back to the user. Some examples are InfoSleuth [2] and Information Broker [17].
2.5 Alternative approaches.
Apart from the architectures mentioned above, there are several other approaches that are used in systems that involve information integration. Some of them are briefly described in the following paragraphs.
Description logics based systems, as the name suggests, are systems based on description logics. Description logics are knowledge representation languages that are used for expressing some concepts. The basic building blocks of Description logics are concepts, roles and individuals. Concepts can be thought of as properties defining a set of individuals. LOOM [16], a research project in the Artificial Intelligence research group at the University of Southern California's Information Sciences Institute, CLASSSIC [3] and Information Manifold [13] systems use description logic for information integration.
Metadata repositories, as the name suggests, are collections of metadata. Queries are formulated by interacting with a global metadata dictionary, and all semantic and schematic heterogeneties are resolved by interacting with an intelligent interface. The intelligent interface also helps the user in framing the queries themselves. There is no inherent underlying data model in these systems.
Another category of systems involves designing and developing a set of classes in a pre-decided application language. The point is that then all the users use this set of classes and inherit their data structures and interfaces. Thus the important classes can be compiled into a library and used by many other users at the application-level. Such systems are referred to as Shared Class Libraries. This decreases the workload for most of the end-users. Image Understanding Environment (IUE) Project [14] provides more details as to how shared libraries are used for data integration.
A cooperative information system can be considered to be a collection of a number of cooperating component systems that are spread over complex computer networks, physically and logically, but that work together by sharing information and goals. More stress is laid on the coordination and cooperation of the various systems rather than on the simple sharing of information.
In the point-to-point-gateway-based systems [12], pairs of multiple systems are integrated, thereby providing an over-all integrated view eventually. The pairs of systems themselves are integrated by a common
query interface, and this pair is interfaced with another (pair or) system in a similar fashion. This gives rise to the problem of having to handle schematic conflicts individually for each pair of systems. ODBC [11] and JDBC [20] are effectively used for achieving this functionality.
Data Warehouses contain data collected from a variety of other information sources. The data can come from databases, the World Wide Web or any other source. But the data that is contained in a data warehouse is not updated when an update is made to the local information source from which the data was originally collected. This way, the warehouse has just a static copy of the collected data. But it provides computational efficiency since all the necessary data is available in one centralized location.
Distributed object model has a major advantage when used in information integration for the reason that the objects can interact with one another without knowing their actual location. These individual objects can interact with each other using well-defined interfaces. Middleware technologies like COM/DCOM and CORBA enable this and can be effectively used in information integration projects.
The Trading concept [19] involves having the traders know what services the client requires and what services the provider provides. Then the trader has to match the requirements with the services available. Depending on what is available, it becomes the duty of the trader to provide the client with the best (optimal) match. Hence this concept is though of more as mediation on the part of the trader, rather than brokering. This is the trading and mediation strategy [BK99].
The above architectures are popularly used for systems involving information integration. We propose an IIS architecture that employs the problem-solving Blackboard technique in Artificial Intelligence. A brief description of this model and technique is discussed in the next section.
3. The Blackboard Technique
The Blackboard technique is a popular AI technique used for problem solving. The Blackboard model has three major components – a global database, called the Blackboard, a set of logically independent knowledge sources (KS), and a set of control data structures or control modules, used for monitoring the changes on the blackboard and for deciding what happens next. Each knowledge source contains some information, which is exclusive to it, and this information is used for solving problems. This knowledge is separate and independent from the knowledge present in any other (knowledge) source. / Another duty of the (knowledge) source is to collect the data that is currently available on the blackboard and store it after, encoding it. All modifications to the blackboard are made by the knowledge sources and all these modifications are explicit and visible. The following is the framework for a blackboard system [9]. In the figure, continuous lines represent data flow and dashed lines represent control flow.
Blackboard Data
Figure 1: Blackboard Framework
The knowledge sources are represented as procedures, sets of rules or logic assertions. Each of these knowledge sources has information about the conditions that have to be true so that they can contribute to a solution. There are pre-conditions that indicate the condition on the blackboard that must exist before the body of the knowledge source is triggered.
The global database or the blackboard is at the heart of this architecture. It is a hierarchically organized database, whose main feature is that it contains all the intermediate (and eventually the final) results to the problem at hand. Each level of the hierarchy in the blackboard can be thought of as an abstraction level and each of these levels view a different perspective of the problem, in terms of a different set of concepts. For example, in the HEARSAY-II [10] system, which is based on the Blackboard architecture, for the speech understanding task, the different abstraction levels in the hierarchy comprised viewing the speech signal in terms of words at one level, at the phonemes at another level, and into the phrases that the words could be grouped into at another level.