Fuzzy Relational Database and Querying System

Fuzzy Relational Database and Querying System

With Compound Query Capabilities

Authored By: Penny Crabtree

Directed By: Dr. Lorraine M. Parker

May 2005

Table of Contents

I. Introduction

II. Fuzzy Relational Database Background

III. Fuzzy Queries

IV. Previous Work

V. Project Description, Design and Implementation

VI. Future Work

VII. References

VIII. Appendix A

IX. Appendix B

I. Introduction

A relational database management system could be defined, at a minimum, as a database perceived by the user as a collection of tables (and nothing but tables); and the operators available to the user are operators that operate on tables. It could be noted that since this minimum definition is relevant to “crisp” data, the resulting data model bears little resemblance to reality. Crisp data refers to data that is precise and is without vagueness. In order to capture more meaning of the data, extensions of the classical relational model have been proposed: fuzzy set theory and fuzzy logic. Fuzzy data allows imprecise and approximate data. Reality can often be best represented via fuzzy databases and queries. Previous work, [1], has shown the implementation of a fuzzy database and querying system with community defined membership values, presenting the ability to search on a defined characteristic. The purpose of this paper is to present compound fuzzy queries, the capability to search on multiple defined fuzzy characteristics.

II. Fuzzy Relational Database Background

Most applications’ data are often partially known or imprecise. For instance, John Doe’s height may be 192 cm. However one may specify John’s height as being tall, or about 190 cm. These statements about John’s height may be useful in answering queries or making inferences based on age.

In order to capture more meaning about the data, several extensions of the classical relational model have been proposed. The fuzzy set theory and fuzzy logic proposed by Zadeh [23] provide a requisite mathematical framework for dealing with such extended data values. Some authors [25, 26] have taken the direction of viewing relational databases in the area of fuzzy set theory with an objective of accommodating a wider range of real-world requirements and providing closer man-machine interactions. These works include extensions of the classical relational algebra operations such as join, projection, etc.

As the classical relational data model is extended to deal with fuzzy information, one must also consider integrity constraints that may involve fuzzy concepts. Integrity constraints, such as functional dependency, multivalued dependency, join dependency, etc, have been identified and sets of complete inferences rules for such dependencies have been proposed [25].

III. Fuzzy Queries

The purpose of fuzzy logic and fuzzy set theory is to capture more meaning about the data. Figure 1 [26] shows a representation of the relationship between application use, the data, and the query language. Enterprises are either precise or vague [26]. The database extension that represents the enterprise is either precise or imprecise. Fuzzy functional dependencies represent ambiguities in data values as well as impreciseness in the association among them. Query languages are designed to express the user’s retrieval request in either a crisp manner or not. However, the issue is whether a particular data item matches a query term; in the case it is not identical to the term. Therefore, a simple query language in which a user can indicate the degree of relaxation permitted to achieve a match in needed. For example, (Blue, 0.62) is a precise statement which represents the imprecision of how blue the eyes are.

Figure 1: The Fuzzy Database Landscape

Commercial DBMS’s suffer from a lack of flexibility and they do not account for the representation of imprecise data [25, 26]. The objective of fuzzy queries is to provide users with new querying capabilities based on conditions that involve their preferences. Meaning, users can describe more (or less) acceptable items, thus defining flexible queries that yield results that are ranked according to their preferences. Fuzzy queries essentially are Boolean queries that have been extended by adding these preferences.

IV. Previous Work

A foundation has been established as to the theoretical basis for fuzzy relational databases. A fuzzy relational database with a natural language querying system was implemented by Dattatri and Joy [1]. The overall goal of the project was to design a database with a querying system to retrieve images whose common language descriptions were defined by the consensus of a particular user community. Such as, a person may be described as having dark blue eyes and a long nose. “dark blue” and “long” are not objective definitions but rather reflect a person or a group’s definition of those terms. Further, the image descriptions would be the consensus of a user community, not the database designers (it would be expected that each user community would describe the same images differently).

To handle the subjectivity of the data, the use of membership values for attributes as defined by the fuzzy database model provided a fitting solution. To allow for simpler design and implementation, multi-valued attributes were not used. Instead, for each image, each of the possible values of an attribute and the membership weights were listed in unique tuples. Each image was given a membership weight (0,1) for every attribute. To illustrate, an image of a person with eyes that are predominantly green with just a hint of blue and no trace of brown could be represented in the database as shown in Table 1.

IMAGE_ID / EYE_COLOR / WEIGHT (µ)
1 / GREEN / 0.8
1 / BLUE / 0.3
1 / BROWN / 0.0

Table 1: Using Membership Values (Weights) for Each Attribute

Initially the image attributes are assigned random weights. Then, through the input of the user community, the “correct” values are achieved. (The user community would perform multiple simple queries. The queries in turn display pictures, allowing the user to agree or disagree. The weights are therefore adjusted for that attribute based on the user community’s response to whether the pictures matched the query.)

The database was queried using subjective, fuzzy queries, by adding an additional user-interface layer to a relational database. This was accomplished by using stored procedures in SQL Server. Only simple queries (i.e., those concerning one attribute) were allowed and “not” compound queries (i.e., those concerning more than one attribute). The modifiers and synonyms selected are noted in Table 2. Each modifier was assigned a corresponding range on the (0, 1) membership interval as shown in Table 3. Each modifier also uses a threshold value. This is a value that is at the midpoint of the modifier’s range and is designed to steady each attribute’s membership value within the modifier’s range, so it accurately represent each community’s consensus. Thus, using the two tables, a query requesting people with “light blue eyes” would return all of the images with EYE_COLOR = BLUE and a membership value between 0.0 and 0.29.

Synonyms / Corresponding
Modifier
Dark / Very
Average / Medium
Light, Pale / Slightly

Table 2: Modifier Synonyms

Modifier / Range_From / Range_To / Midpoint
“Very” / 0.75 / 1.0 / 0.87
“Medium” / 0.30 / 0.74 / .52
“Slightly” / 0.0 / 0.29 / .20

Table 3: Modifier Ranges and Midpoints

Random decimal values (0.0, 1.0) were initially assigned as weights for each attribute before being given to a user community. The user community would then submit and view the results from a query and provide feedback on whether the results met their query or if they believed the image would be better defined using a stronger or a weaker modifier. If a user chose the former then the attribute’s weight was moved incrementally (.01) towards the threshold value of that range; if the weight was already at the threshold value it was not changed. Thus, by adjusting the membership weight so that it was as deeply in the range as possible (i.e., at the midpoint), the community opinion was strengthened with concurring user feedback. However, if the user chose the latter, then the attribute’s weight was moved slightly (.01) in the direction suggested by the user.

In the background, the user-entered queries were parsed into the actual query and the modifier/attribute for each was stored in a data file. These files were displayed on the (Visual Basic .NET) form. Depressing the “submit” button called a stored procedure that, when receiving the query and modifier/attribute information as parameters, queried the (SQL Server) database. A stored procedure determined the range values for the given modifier and then returned the images whose membership value of the attribute was within that range. The database displayed those images to the user. Below each image was checkboxes asking the user to indicate whether the image satisfied the query, or if instead the image fit the identifier that was one modifier above or below it. If the user had selected a different modifier, then the image’s weight was changed accordingly by a stored procedure. In the end, the single query attributes would result in the community opinion.

V. Project Description, Design and Implementation

The fuzzy database described in this paper is only one component of the database research group project under the direction of Dr. Lorraine M. Parker. The overall goal of the project for the 2005 fall semester research group was to expand the existing fuzzy database, which includes a natural language querying system, to retrieve images whose common-language descriptions would be defined by the consensus of a particular user community. In most everyday language, items are described subjectively. That is, a person may be described as having very blue eyes and a broad face; “very blue” and “broad” are not objective definitions but rather reflect a person’s (or a community’s) definition of those terms.

The overall goal was to expand on the existing fuzzy database prototype which allows users to query the database using everyday language; the database would return those images that met this subjective criteria. This paper represents the expansion of the query excepted by allowing a query of compound attributes or the existing single attribute query. Further, the resulting images would be described by the community, not the database designers, in the sense that each image’s associated weights for the defined attributes (eyes, face, hair, etc) would reflect the feedback of that particular user community. Thus, it would be expected that each user community would describe the same images differently.

To allow a query of compound attributes, the prototype looks for and reads in both the query and attribute’s text files. A sample input.txt (which contains the query) is shown in Table 4 and a sample input1.txt (which contains the modifier and attribute) is shown in Table 5. These are located in the c:\ directory and are read in by the prototype via the QueryForm form in Appendix B. The program in turn displays the information from the text files to the screen via the Form1 form. This allows the viewing community to confirm or change the criteria, as shown in Figure 2.

Input.txt
Select * from person where color.color = 'blue';
select * from person where face.face = 'broad';

Table 4: Input.txt File

Input1.txt
Very
Blue
Very
Broad

Table 5: Input1.txt File

Figure 2: Query Box

The results of the database query of the compound attributes are displayed to the screen in images via the Fetch_Data stored procedure in Appendix A, which fetches the data from the database. The Fetch_Data stored procedure is called by the Form1 form which interprets the fuzzy query and returns the images to be displayed to the user interface. It interprets the modifiers and translates them into SQL queries. In turn, the defined user community will provide feedback on whether the result(s) met their query by agreeing or disagreeing, as shown in Figure 3. If agreeing, the weights assigned to each attribute are modified in the database according to the attribute’s modifier and the threshold value. If disagreeing, the image is redisplayed allowing the community to use the selections given, to indicate which attribute(s) varies from the query, as shown in Figure 4. Depressing the ok checkbox for an attribute(s) will indicate the attribute met the original query. The weights assigned to each attribute are updated accordingly and stored in the database. The new

weights are then used as other queries are made, defining the images based on the particular user community’s natural language. The Message form in Appendix B informs the user that the update operation was successful and asks the user whether they would like to run another query.

Figure 3: Agree or Disagree with Query

Figure 4: Attribute Changes

The original weights assigned to each attribute are randomly defined weights using the Initialize_Weights stored procedure in Appendix A. This procedure is not invoked via the application but executed directly on the database to initialize the weights. The user community views the resulting images from the attribute query and based on the community natural language, over time the attributes become the defined conclusion of the user community. Based on user feedback, weights are adjusted at a rate of .01 via the Update_Data stored procedure in Appendix A.

VI. Future Work