What: Proposal for Master of Science Thesis

What: Proposal for Master of Science Thesis

The Observer Design Pattern and the Maintenance of Consistency Constraints in an Object-Oriented Database

Mark J.Tseytlin

What: Proposal for Master of Science Thesis

Discipline: Computer Science

Revision: F

Date Created: 03/29/2000

Title of the Study

The Observer Design Pattern and the Maintenance of Consistency Constraints in an Object-Oriented Database

Statement of the Problem

The current trend in database theory is towards total object orientation [K95]. However, totally object oriented systems are not currently as robust as relational systems. A common side effect of partitioning a system into a collection of cooperating classes is the need to maintain consistency between related objects [GHJVB, pg.293]. A database system uses a set of rules called integrity or consistency constraints to maintain uniformity among objects. These constraints govern the procedural actions needed to maintain consistency in the database.

Consistency isn't as much of a problem in relational data models. These models use different types of integrity constraints to insure consistency within the system. The most used and well-known ones are entity integrity, referential integrity, and foreign key constraints [EN94, pg. 147]. Unfortunately, relational applications must be designed and populated from the ground up. In other words, they allow little abstraction beyond flat tables and therefore very little reuse of existing code is possible. Object-oriented systems can be abstracted to a higher degree. A system can house many levels of abstract classes. Classes that contain undefined methods are said to be abstract. They don't contain any objects of their own that is, objects that do not come from a proper subclass [COOL, pg. 176]. Abstract classes are reusable because their methods can be specialized in concrete subclasses. However, because each object is believed to represent a real world entity many programmers mistakenly choose very specialized and therefore non-reusable implementations. That is, many object-oriented systems bear the cost of being implemented as a set of concrete classes. Thus, currently the OODMs (Object-Oriented Data Models) have not yet developed a coherent paradigm in dealing with consistency due to the ad-hoc nature of their implementations [PMD95].

The Observer design pattern [GHJVB, pg. 293] is a software design pattern that creates a new outlook on the implementation of actions to maintain consistency constraints in object-oriented systems. It separates the actual data storage and manipulation (implemented in the subject objects) from automatic notification and update of dependent objects (implemented as observer objects) by use of a one-to-many dependency between objects in the system. This allows high level of abstraction on each end because both subject and observer objects [GHJVB, pg. 293] are independent from each other.

The objective of this study is to implement a system that will illustrate the concept of the Observer pattern. In particular it will focus on the application of the concepts of the Observer pattern to a concrete database system that will be kept as totally object oriented as possible. The Observer pattern will be used to maintain database consistency. To the best of our knowledge no one has done a study such as this before.

Justification for and Significance of the Study

A database is a collection of related data or known facts that can be recorded and that have implicit meaning [EN94, pg. 2]. Data collection and manipulation were very important in computer science from early on. In fact, these were the main reasons for earliest computer applications, such as the Census [FW58, pg.2314, 2315]. There were many different data models introduced over the years (see Exhibit# 1) [EN00]. (In this proposal, we refer to models used for the design of schema as data models.)

Data models were record-oriented with data stored in the rows and columns of tables for many years. This was due to limited capabilities of the computer technology. The main obstacles to the progress were slow processor speeds, limited memory sizes, and the small storage capabilities of earlier computers. From the very first data files of early spreadsheet programs to the current relational systems like MS Access, the data was stored in records and all the database operations were dependent on manipulation of these records. Each database was concrete and implemented for a specific set of related facts. The problem with this approach as we have already pointed out was that no implementation could be reused as the set of related facts changed. This was because the records had to be populated with concrete instances from the base up. In fact, none of the first data models had abstract constructs in them.

The relational model was the first to introduce the concept of data independence [EN00]. In other words, it was the first to exemplify a model and a query language in which the layout of the data on a disk drive was not determined by the data model. The model was implemented by employment of one level of abstraction with a mapping from the database schema to the physical layout of the data. Data was organized into normalized relations to prevent data-anomalies caused by modification. Unlike the hierarchical and network data models, databases based on the relational model used foreign keys instead of physical links for data relations [JO98, pg. 7]. This in fact allowed linking of data between different tables and even in different databases. Nevertheless, the first attempt to represent real world objects in a database system didn't come about until the inception of semantic models [PM88].

Semantic data models brought in strong capabilities in relationship modeling. The first relationship was generalization. Generalization is a technique for describing a real world object by properties that are common to many realworldobjects of the same general class. For example, a square can be represented with sides and angles -- properties that are universal among all the rectangular shapes. The second was aggregation. Aggregation is a technique for describing a real world object as a composition of sub-objects. Notwithstanding the fact that the concept of generalization is very close to the concept of data abstraction the semantic model didn't support abstract classes. Also, complex constraints that are very common to real world objects were not supported either. For example, active database constraints were not supported. In fact, semantic models permitted expression of constraints such as cardinality constraints. However, this data model didn’t permit the expression of the means by which constraints were to be maintained. If we were to create a 1 to N relationship between two objects (Professor and Advisee) [K97, pg. 83], the constraint will eventually be violated if the user associates an object of type Advisee with two objects of type Professor. The system could respond in more than one way. Therefore, semantic models did not support constraints with active rules to express the procedural means for maintaining consistency [PMD95].

The Object-Oriented paradigm allows a more complete representation of real world objects in computing [JO, pg. 8]. The notion of encapsulation allows the incorporation of methods representing behavior of a real world object into classes that define such objects. The concepts of inheritance and specialization are natural to this data model. Because of specialization, database constraints could be defined at the highest possible level – the abstract class level. Unfortunately, it is very difficult to keep a collection of related objects consistent as the states of different objects change. Furthermore, even now the best OO systems lack many of the major database features existent in RDBs (Relational Databases), such as a full nonprocedural query language, meta data management, views and authorization [K95, pg. 6]. There have been few attempts to fix above problems. One of the approaches was to extend the object-oriented model with a semantic model. Michael Doherty implemented an example, at URI (University of Rhode Island). In the SORAC (Semantic Object Relationships And Constraints) data model he extended the object-oriented data model to allow the relationships between objects to be modeled within the object-oriented paradigm [MD92, pg. ii].

The SORAC model, as well as the work of other researches that was done in this area, raised some questions and uncovered some problems that have not been thought of before. For example, one of the topics that has not been addressed in the SORAC data model is the issue of insertion constraints to define the required relationships between objects [MD92, pg. 54]. In an object-oriented database system, insertion is the creation of a new object. Consequently, insertion constraints are the rules that govern the creation process in order to keep the database consistent. The main logical dilemma in the SORAC project with regard to the insertion constraints was whether relationships between classes define requirements on all instances of related classes, or does the relationship need to be connected to an instance before the relationship constraints hold [MD92, pg. 56]. This and other problems were aggravated by the fact that the programming languages used to implement these models weren't totally object oriented. There was no mechanism to abstract these design and implementation issues. In fact totally object oriented languages weren’t available until recently. The use of such a language allows the representation of relationships as objects and therefore permits a high level of abstraction in their implementation. JAVA is such a language.

In JAVA everything has to be done inside a class or by a call to a method defined inside predefined class [A96, pg. 2]. As James Gosling pointed out, it is also a portable programming language [GM99]. There are no pointers in JAVA. Instead objects are passed as arguments directly. This is very convenient for the purposes of creating of totally object-oriented database. Third of all, JAVA does not support separate header files. Exceptions are first-class characteristics in JAVA. Therefore, in case of an error, the code calling the method is not activated. Instead, the exception-handling mechanism begins its search for a handlerthat can address this particular error condition [CH96, pg.419]. This is very useful if the error conditions are expected as in the verification of existence constraints. The standard JAVA library has a predefined class -- String containing all of the utilities used to operate on the strings of characters [CH96, pg.63]. The advantage of this representation is that JAVA’s strings can not be accidentally overwritten as character arrays can be [A96, pg.3]. This is beneficial for data entry, manipulation, and storage programs like databases because each line is stored as a separate string in a list of strings. Also, instead of multiple inheritance, JAVA offers something called an interface. An interface is a promise that your class will implement certain methods with certain signatures [CH96, pg.156]. This is very useful in a complex object oriented database environment where records are implemented as instances of compound objects.

Another of JAVA’s innovations is the support for multithreading. This feature allows running more than one operation at a time. This phenomenon is effective when, for instance, two transactions make modifications to the same part of the database from different sites at the same time. In this situation one of the transactions should wait for the other to complete before it can proceed. The idea behind the concept of multithreading is further explored in the Observer design pattern.

The Observer design pattern is intended to define a one-to-many relationship between objects so that when one object changes state, all it’s dependents are notified and updated automatically. For example, suppose we need to separate the user interface object from the underlying application data [GHJVB, pg. 293] (see Exhibit# 2). This is an example of the use of an Observer pattern for one application. In the exhibit we can clearly see that the subject – the data object is completely independent from the observers – user interface objects. Furthermore, each observer object is completely independent from other observers. As one of these observers changes its state through modification by an end user, it sends a request to the data object. This request contains the announcement that the change took place as well as the information about what changed. The subject makes sure that change is valid and propagates the change to all it’s other dependant observers. Then it sends an OK to the observer that initiated the change of state allowing it to finalize its own change of state. This paradigm is very useful as the tool to enforce consistency constraints in a totally object oriented environment. Every time an object changes its state in a database environment it affects the states of many other objects in the same database. However, unless the objects are tightly coupled the object in question doesn’t have any knowledge of how

many and which objects need to change their states to preserve consistency. The subject object takes care of that. The system can support many subject objects to make sure that it doesn’t get bogged down with all the updates to the observers. Another technique used to optimize the performance of the Observer pattern based system is by specifying modifications of interest explicitly [GHJVB, pg. 298]. This technique is based on allowing observers to register with their respective subject for a specific event, which is of interest to them. So when such even occurs and only in this case an observer gets a change of state notification from the subject.

Methodology

This thesis will consist of a research study and program implementation. The research will be directed towards understanding the Observer design pattern and its impact as a means of implementation of a totally object-oriented database. The other objective of the research will be an attempt to generalize the concepts of automatic updates from a concrete example to a more general framework.

The implementation part of the project will consist of the database implementation that will be as close to totally object oriented as possible. The database will contain a subset of the scheme introduced in the [CSC50197]. It will be implemented in the JAVA programming language. The concept of Observer pattern will be applied to this database as the means of notification of other objects about a change without making any assumptions about the state the Subject objects are in. Also, the Observer concept will be used to separate the display part of the database from the data collection and data manipulation part. This will help to individualize parts of the schema, therefore allowing generalization and reuse in further research attempts.

Resources Required

Research part of the this thesis will require publications that will be obtained through the following sources:

1. URI, Department of Computer Science and Statistics

2. Raytheon Systems Company, Technical Information Center

3. Inter-library loan

Implementation part of this study will require the following resources:

1. Borland JBuilder3 JAVA development environment

2. World Wide Web

3. URI, Department of Computer Science and Statistics computer system

The first will be obtained at COMPUSA, and the other two are readily available.

Literature Cited in the Proposal

[COOL] “Compilation of Object-Oriented Languages”

[CSC50197] Computer Science 501, “JAVA Assignment: Mini-Prolog Interpreter”, Department of

Computer Science, URI 1997

[GHJVB] Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides, Grady Booch, “Design

Patterns: Elements of Reusable Object-Oriented Software”, Addison-Wesley Professional

Computing

[FW58] Funk and Wagnall, “Standard Reference Encyclopedia”, Standard Reference Works

Publishing Company, Inc., NY, New York, 1958

[A96] Gary Aitken, “Moving from C++ to JAVA”, Dr. Dobb’s Journal, 1996

[CH96] Gary Cornell, Cay S. Horstmann, “Core JAVA”, SUN Microsystems Inc., 2550 Garcia Ave,

Mountain View, CA 94043-1100, 1996

[JO98] Jayamani Odayappan, “MS Thesis: Semantic Framework for Architectural Design Domain”,

Department of Computer Science, URI, 1998

[GM99] James Gosling, Henry McGilton, “The Java Language Environment: A White Paper”,

, 1999

[PMD95] Joan Peckham, Bonnie MacKellar, Michael Doherty, “Data Model for Extensible

Support of Explicit Relationships in Design Databases”, VLDB journal, Vol. 4, No. 2.

April 1995, pg. 157-192

[PM88] Joan Peckham, Maryanski, “Semantic Data Models”, ACM Computing Surveys, Vol.20,

No.3, September 1988

[MD91] Michael Doherty, “MS Thesis Proposal: Semantic Relationships and Database Object”,

Department of Computer Science, URI, 1991

[MD92] Michael Doherty, “MS Thesis: Implementing Relationships in an Object-Oriented

Database”, Department of Computer Science, URI, 1992

[EN94] Ramez Elmasri, Shamkant B. Navathe, “Fundamentals of Database Systems 2nd addition”,

College of Computing, GA Institute of Technology, 1994

[EN00] Ramez Elmasri, Shamkant B. Navathe, “Fundamentals of Database Systems 3rd addition”,

Addison Wesley, 2000

[K95] Won Kim, “Modern Database Systems: The Object Model, Interoperability, and Beyond”,

ACM Press, New York, NY, 1995

1