An Algorithm of Visualizing Relational Database for Improving the Usability

By

Chaowen Huang

Supervisor

Dr. Jixue(Jerry) Liu

A thesis submitted for the degree of

Master of Computer and Information Science

School of Computer and Information Science

Division of Information Technology, Engineering and the Environment

University of South Australia

Mawson Lakes South Australia

June 2011

Table of Contents

Table of Figuresiv

Table of Tablesv

Declarationvi

Acknowledgementsvii

Abstractviii

1. Introduction1

1.1 Thesis Background1

1.2 Motivation2

1.3 Research Question4

1.3.1 Sub Research Question4

1.3.2 Sub Research Question4

1.3.3 Sub Research Question4

1.4 Explication of Research Questions5

1.5 Scope5

1.6 Contribution6

2. Literature Review7

2.1Database Representation7

2.2Graph Layout Types10

2.2.1Original Layout Graph Types11

2.2.1.1Tree Layout12

2.2.1.2Sugiyama Layout (General Directed Graph)13

2.2.1.3Spring Layout (Force-Directed Methods)14

2.2.2Modern Graph Layout Types14

2.3Graph Drawing Aesthetic15

3. The Aesthetic Features18

4. Preliminaries22

4.1Definition of Notation in Graph22

4.2Definition of Notation in this Dissertation23

4.3Specific Meaning of the Drawing25

5. Methods for Achieving Aesthetic Features28

5.1Distance Requirement28

5.2Maximum Number of Positions in Each Circle30

5.3Number of Circles in Each Ring32

5.4Overlap (Straight-arc)34

5.5Direction37

6. Positioning Vertices38

6.1Positioning Centre Vertex38

6.2Positioning the Vertices Connecting to the Centre38

6.3Positioning the Vertices Connecting to 42

6.4Positioning the Rest Vertices of the Drawing 47

7. Algorithms48

7.1 Main Proposed Algorithm (Each Cluster)48

7.2Get Root Table49

7.3Positioning Centre50

7.4Positioning Vertices Connecting to the Centre50

7.5Positioning Rest Vertices52

7.6Get Clusters53

8. Implementation and Drawings55

8.1 Drawing of dbQUIZ55

8.2Drawing of dbSYSTEM56

8.3Drawing of dbEX60

8.4Drawing of dbPPDM61

9. Discussion62

9.1 Correctness and Effectiveness62

9.2Overlap Problem65

9.3Distance66

9.4Complexity66

9.5Future Work68

10. Conclusion69

11. Reference71

Table of Figures

Figure 1 Example of reducing the number of crossings

Figure 2 Example of straight arcs drawing

Figure 3 Example of the same general direction

Figure 4 Example of direction

Figure 5 Illustration of circles and Rings

Figure 6 Illustration of polar coordinate

Figure 7 Illustration of designed drawing in the dissertation

Figure 8 Illustration of calculating the maximum number of positions in each circle

Figure 9 Illustration of Labelling Positions

Figure 10 Illustration of Overlap

Figure 11 Illustration of Positioning the Vertices Connecting to Centre

Figure 12 Illustration of Refinement for Positioning the Vertices Connecting to Centre in

Figure 13 Illustration of positioning the vertices connecting to

Figure 14 Illustration of positioning the vertices connecting to

Figure 15 Illustration of positioning the vertices connecting to

Figure 16 Example Drawing of Cluster 1 in dbQUIZ

Figure 17 Example Drawing of Cluster 2 and 3 in dbQUIZ

Figure 18 Example Drawing of Cluster 1 in dbSYSTEM

Figure 19 Example Drawing of Cluster 2 in dbSYSTEM

Figure 20 Example Drawing of Cluster 3 in dbSYSTEM

Figure 21 Example Drawing of Cluster 4 in dbSYSTEM

Figure 22 Example Drawings of Cluster 5 to 9 in dbSYSTEM

Figure 23 Example Drawings of dbEX

Figure 24 Example Drawing of dbPPDM

Figure 25 Results of and of Each Drawing

Table of Tables

Table 1Definition of notation in this dissertation

Table 2 Definition of polar coordinate

1

Declaration

I declare that this thesis does not incorporate without acknowledgment any material previously submitted for a degree or diploma in any university; and that to the best of my knowledge it does not contain any materials previously published or written by another person except where due reference is made in the text.

Chaowen Huang

June 2011

Acknowledgements

I would like to thank my supervisor, Dr. Jixue Liu, for inspiration, support and encouragement during this year. I also would like to thank my friends and schoolmates that help me to complete my thesis. Finally I would like to thank my family in China.

Abstract

Information visualization (IV) which is a combination of several aspects such as scientific visualization, data mining,graphics and human-computer interface has been becoming a more and more significant field to attract attention in the recent years. Graph visualization is one of the subfields of the IV and centres on representing the structured data sets which has inherent relation among the data members.

Database representation has been proved that it has great influence on users understanding and using database system in the previous surveys. There are several different means of representing database, which may result in affecting human performance in different level.There is a proverb that“One picture is worth a thousand words”. Graph drawing is used for representing database in this study.

Graph drawing aesthetics have been evaluated that it is significant for improving human performance in understanding. There are several graph drawing aesthetics which have been evaluated in the previous researches.

An algorithm, DBviewer, is designed from the user point of view by achieving some drawing aesthetics selected from the previous studies for improving the usability in this dissertation. The proposed algorithm is implemented in C# and tested by using four examples exported from oracle database.The performance of DBviewer algorithm is on the whole good. No matter in theory and in practice, DBviewer algorithm is proved that it can correctly visualize the database. The results are also proved the effectiveness of achieving the drawing aesthetics features.

1

1Introduction

1.1 Thesis Background

In the last decade, information visualization (IV) which is a combination of several aspects such as scientific visualization,data mining,graphics and human-computer interface has been becoming a more and more significant field to attract attention[14, 31]. IV can not only represent existent information of data sets but also help users discover hidden associations in the data sets because the visual form is designed from another point of view, human’s natural visual capabilities(such capabilities can be viewed as graph drawing aesthetic in this study) [15]. Thus, IV makes human process large complicated data sets effectively, especially for the unstructured data sets which has no inherent relation among the data members[16].

Graph visualization which is one of the subfields of the IV has different application goal with IV [16].By contrasting to IV, graph visualization centres on representing the structured data sets which has inherent relation among the data members.Graph visualization is different from the general graph drawing since graph visualization not only draws graphs but also focuses on dealing with the graph[16].Normally, while graph visualization is applied to represent the structured data sets, data members are represented as vertices in a graph and the relations among them are represented as edges [15, 16].

Graph visualization has been widely applied to various areasin recent years[16]. The most common and familiar application for human is a file hierarchy in a computer that is represented as a tree.Graph visualization is also applied in many other areas such as biology, chemistry, data structures, entity-relationship diagrams (E-R diagram), data flow diagrams, semantic networks and knowledge-representation diagrams, logic programming, document management systems, and virtual reality [16].

Database representation which is a human-computer interaction that human can use to interact with database structure has been proved that it has great influence on users understanding and using database system in the previous surveys[15, 20].Furthermore, different representations have different influence on the user understanding and using database[20]. More exactly, different representations may use different semantics, symbols or methods of representing relationship between data (text, graph or text combine with graph)to represent the database, which results in affecting human performance in different level [20].

However, the same type of representation may use different graph drawing aesthetics to draw a graph (if the means of representing include graph method).Graph drawing aesthetics have been evaluated that it is significant for improving human performance in understanding [24-26].

1.2 Motivation

It is universally acknowledged thatit is easier for users to read and use the data which is represented in graphical form than in textual form [15, 26].There is always an exception to any assumption. Graphical form has some advantages such as ease in overview but it also has distinct disadvantages such as not ease in representing information clearly [15, 17].Thus, in certain condition, textual form is better than graphical form.Furthermore, the survey of Petre, Scaife and Rogers in 1995 has also proved that graphical form is not always better than textual form[23, 32].Nevertheless, the most useful advantage of graphical form whichdiscovers hidden relationships in the data setscan improve the effect of users using and learning the data sets [4, 16].

Until now, many graph layout algorithms have been designed so that relational database can be represented in graphical form [26]. However, the motivation of these graph layout algorithms is to improve the computational efficiency instead of improving human performance of understanding and using information [5].In other words, these graph layout algorithms are not designed from users understanding and using point of view[26].

Normally, in the relational database, relations are represented by relational tables (textual form) which contain their own attributes. Although several table representation means (textual forms) have been designed, including table representation, table with arrows representation and table with text only representation, these table representation means still have the limitation of representing the entirety information of the relational database.By contrast, this limitationis one of the advantages for using graphic form discussed above. For this reason, some other representation methods have been designed, such as logical data structure (LDS) representation, and these methods also have their own weaknesses. For example, by using LDS representation (graphical form), understanding the exactly relationship between attributes which belong to two relation tables are difficult for the users[20]. Because the representation of links is showed between tables, and do not be showed precisely between the attributes in the tables in LDS representation. More exactly, LDS representation cannot represent the information of relations among the attributes clearly and this is a significant drawback of graphic form discussed above.

Regular relational databases have, at least, hundreds of relation tables and foreign keys (FKs) which indicate the relations between the relation tables. For this reason, it is difficult for users to study the relational database and get the useful information from the relational database effectively.

The motivation of this dissertation is to design an algorithm from users understanding and using point of view to visualize relational database as a ‘quality’ graph that can improve performance of users in understanding and using relational database.This means the algorithm designed in this dissertation considers the graph drawing aesthetics in priority.

1.3 Research Question

Can we design an algorithm to draw a quality graph which can aid relational database users understanding and using relational database more effectively?

1.3.1 Sub research question

What means of database representation should be used for relational database representation?

1.3.2 Sub research question

What drawing aesthetics should be considered in designing the algorithm?

1.3.3 Sub research question

What means of navigation and interaction can be used for aiding users usingthe drawing of relational database?

1.4Explication of Research Questions

There are many means of database representation, including E-R diagram representation, logical data structure representation, table representation, table with arrows representation, and table with text only representation[20]. They all have their own strengths and weaknesses. For this reason, it is obvious when designing a representation method should not only have the most numerous strengths among previous representation methods but also avoid the most numerous weaknesses in this study.

The five primary common drawing aesthetics including crossings, bends, angle, symmetry and orthogonality have been validated by Helen C. Purchase in 1997[24]. Orthogonality has been classified into node orthogonality and edge orthogonality and one more drawing aesthetic, consistent flow direction, has been validated by Helen C. Purchase later, in 2002 [27]. These common drawing aesthetics have different influence on user understanding in different drawings. Since, not all of these common drawing aesthetics are used in this study.

Several different navigation and interaction methods have been developed for aiding users viewing the graph, for example, zoom and pan technique. One of these navigation and interaction methods would be chosen to use in this study.

1.5 Scope

This dissertation is focused on designing an algorithm forvisualizing relational database as a graph from human usability point of view.Furthermore, this study is based on basic drawing skills and the existing layout algorithm (original graph layout algorithm) to design the layout algorithm for visualizing the relational database.Additionally, graph drawing aesthetics are the only consideration in designing the layout algorithm in this research. More exactly, other considerations such as computational speed of the algorithm are not considered in the algorithm designed in this study.Finally, the visualization in this study is based on two dimensions.

1.6 Contribution

There are several contributions of this dissertation to the area of database representation using graph drawing aesthetics to design.Contributions of this research are as follows:

A survey of existing graph layout techniques which are designed from improving computational efficiency point of view,

A survey of existing graph drawing aesthetics that have been evaluated to affect human understanding and using graph,

An algorithm is designed to centre on improving usability of users rather than computational efficiency,

An attempt to use graph drawing aesthetics as main consideration to design layout algorithm.

2Literature Review

Given the research questions presented at previous section, there are three main relevant streams being conducted in this dissertation. Thus, this chapter is divided into three parts:the first section is an overview of database representation– its definition, role and influence, and a review of the researches related to database representation; the second section presentsa review of existing graph layout types; the final section shows the research of existing graph drawing aesthetics.

2.1 Database Representation

Database representation which has usability of communicating users to database structureis part of the research on database management system (DBMS) user performance [20].Although DBMS user performance has been researched for three decades, the impact of database representation on DBMS user performance was considered and proposed indecade ago by Leitheiser and March[20].Before this formal research appeared, most of the researches on DBMS user performance focused on the difference of data model and query language. Database representation is a not confirmed factor of DBMS user performance or not even considered in these researches.Some of these related researches are showed in the following part.

In the research[21], Lochovsky and Tsichritzis designed a method with a set of measures to measure DBMS user performance in different data model and different data (query) language (DMDL). In their study, they used Educational Data Base System (EDBS) as management system and compare the effectiveness of three EDBSs user performance by using different DMDL, including a hierarchicalsystem, a network system, and a relationalsystem.More exactly, DMDL provided by hierarchical systemis similar to information management system proposed by IBM; they build the DMDL for the network system based on the data-base task group report (DBTG)[12, 37]; the construction of DMDL ofrelational system follows the relational model proposed by Codd in 1970 and the ALPHA data (query) language also proposed by Codd in 1971[7, 8].Three representation, tree representation, network representation and table representation, were compared accordinglyalong with the comparison of three EDBSs.Due to the experiment method measuring three EDBSs in three different DMDL, there is a limitation in the study that Lochovsky and Tsichritzis cannot confirm whether the results (user performance) are caused by the representation method and how much the representation method occupies.

One year later, in 1978, Brosey & Shneiderman investigated DBMS user performance on two different data model, relational and hierarchical data models[6].The first experiment of study separated subjects into two groups by considering their computer programming experience.It should be noted that separating subjects into different groups is beneficial to investigating DBMS user performance because users who have different levels in computer knowledge might have different performance in completing database management tasks.Furthermore, the results of the research proved this assumption.The group of ‘advanced’had a better performancethan the group of ‘beginner’when doing comprehension task by using relational model. However, the ‘bignner’ group had a better performance when doing comprehension task by using hierarchical model.In additionally, there was no significant difference between two groups when doing memorization task by using hierarchical model.

Later in 1989, the research which was published by Kenny et al.[19]did the comparison of two representation method, entity-relationship representation (on ER model) and table representation (on relational model), in the same language environment, structured query language (SQL).The result of this study is that there is no significant difference of the two representation methods applying to the same language environment, SQL.Comparing with research conducted by Lochovsky and Tsichritzis[21], this research only focus on one data language, SQL, but authors still cannot confirm whether the results (user performance) are caused by the representation method and how much the representation method occupies. This study also has two obvious limitations: (1) the SQL language is designed for user interacting with relational model, which might make a bias in this study; (2) the subjects in the study are all students who only have limited computer experience, which might make the result not being suitable with users who have professional computer experience.

The research conducted by Juhn and Naumann[18] is different from the researches [21], [6] and [19].More exactly, Juhn and Naumann directly conducted the comparison effectiveness of four different database representations, two semantic representations and two table representations, on user studying and using database by avoiding using a query language. The results of the study show that user understanding of relationship by using semantic representation is better than by using table representation. But user understanding of identifiers by using table representation is better than by using semantic representation. One valuable discovery of this study is that the table representation by adding arrows to represent the relationship between primary keys (PKs) and foreign keys (FKs) make user better understanding of relationship of table representation.