DBMiner [1]:
A data mining tool for large relational databases [1]DBMiner, a data mining system for interactive mining of multiple-level knowledge in large relational databases, has been developed based on our years-of-research. The system implements a wide spectrum of data mining functions, including generalization, characterization, discrimination, association, classification, and prediction. By incorporation of several interesting data mining techniques, including attribute-oriented induction, progressive deepening for mining multiple-level rules, and meta-rule guided knowledge mining, the system provides a user-friendly, interactive data mining environment with good performance.
Project Overview [1]
A data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational databases. It is based on studies of data mining techniques and experience in the development of an early system prototype, DBLearn. The system implements a wide spectrum of data mining functions, including generalization, characterization, association, classification, and prediction. By incorporation of several interesting data mining techniques, including attribute-oriented induction, statistical analysis, progressive deepening for mining multiple-level knowledge, and meta-rule guided mining, the system provides a user-friendly, interactive data mining environment with good performance.
Project Description [1]
Figure: General architecture of DBMiner
The system has the following distinct features:
It incorporates several interesting data mining techniques, including attribute-oriented induction, progressive deepening for mining multiple-level rules and meta-rule guided knowledge mining, etc., and implements a wide spectrum of data mining functions including generalization, characterization, association, classification, and prediction.
It performs interactive data mining at multiple concept levels on any user-specified set of data in a database using an SQL-like Data Mining Query Language, DMQL, or a graphical user interface. Users may interactively set and adjust various thresholds, control a data mining process, perform roll-up or drill-down at multiple concept levels, and generate different forms of outputs, including generalized relations, generalized feature tables, multiple forms of generalized rules, visual presentation of rules, charts, curves, etc.
Efficient implementation techniques have been explored using different data structures, including generalized relations and multiple-dimensional data cubes, and being integrated with relational database techniques. The data mining process may utilize user- or expert-defined set-grouping or schema-level concept hierarchies which can be specified flexibly, adjusted dynamically based on data distribution, and generated automatically for numerical attributes.
Both UNIX and PC (Windows/NT) versions of the system adopt a client/server architecture. The latter communicates with various commercial database systems for data mining using the ODBC technology.
Major functional modules [1]:
Figure:Knowledge discovery modules of DBMiner
DBMiner characterizer
/ The characterizer generalizes a set of task-relevant data into a generalized relation which can then be viewed at multiple concept levels from different angles. In particular, it derives a set of characteristic rules which summarize the general characteristics of a set of user-specified data (called the target class). For example, the symptoms of a specific disease can be summarized by a characteristic rule.DBMiner discriminator
/ A discriminator discovers a set of discriminant rules which summarize the features that distinguish the class being examined (the target class) from other classes (called contrasting classes). For example, to distinguish one disease from others, a discriminant rule summarizes the symptoms that discriminate this disease from others.DBMiner association rule finder
/ An association rule finder discovers a set of association rules (in the form of "") at multiple concept levels from the relevant set(s) of data in a database. For example, one may discover a set of symptoms frequently occurring together with certain kinds of diseases and further study the reasons behind them.DBMiner data classifier
/ A classifier analyzes a set of training data(i.e., a set of objects whose class label is known) and constructs a model for each class based on the features in the data. A set of classification rules is generated by such a classification process, which can be used to classify future data and develop a better understanding of each class in the database. For example, one may classify diseases and provide the symptoms which describe each class or subclass of diseases.DBMiner predictor
/ A predictor predicts the possible values of some missing data or the value distribution of certain attributes in a set of objects. This involves finding the set of attributes relevant to the attribute of interest (by some statistical analysis) and predicting the value distribution based on the set of data similar to the selected object(s). For example, an employee's potential salary can be predicted based on the salary distribution of similar employees in the company.DBMiner meta-rule guided miner
/ A meta-rule guided miner is a data mining mechanism which takes a user-specified meta-rule form, such as "" as a pattern to confine the search for desired rules. For example, one may specify the discovered rules to be in the form of "" in order to find the relatinships between a student's major and his/her gpa in a university database.DBMiner evolution evaluator
/ A data evolution evaluator evaluates the data evolution regularities for certain objects whose behavior changes over time. This may include characterization, classification, association, or clustering of time-related data. For example, one may find the general characteristics of the companies whose stock price has gone up over 20% last year or evaluate the trend or particular growth patterns of certain stocks.DBMiner deviation evaluator
/ A deviation evaluator evaluates the deviation patterns for a set of task-relevant data in the database. For example, one may discover and evaluate a set of stocks whose behavior deviates from the trend of the majority of stocks during a certain period of time. The module contains the following three functions:- recognizes or identifies the general trend and/or behavior for data in the database,
- detects the set of data which deviates from such a trend or behavior, and
- summarizes the general characteristics of deviation data.
DBMiner user interfaces
/ Three user interfaces, UNIX-based, Windows/NT-based, and WWW/netscape-based GUIs have been developed to allow users to interactively discover multiple-level knowledge in large relational databases, it integrates well with existing commercial database systems with high performance, and is robust at handling noise and exceptional data.Further Development of DBMiner [1]
The DBMiner system is currently being extended in several directions, as illustrated below.
- Further enhancement of the power and efficiency of data mining in relational database systems, including the improvement of system performance and rule discovery quality for the existing functional modules, and the development of techniques for mining new kinds of rules, especially on time-related data.
- Integration, maintenance and application of discovered knowledge, including incremental update of discovered rules, removal of redundant or less interesting rules, merging of discovered rules into a knowledge-base, intelligent query answering using discovered knowledge, and the construction of multiple layered databases.
- Extension of data mining technique towards advanced and/or special purpose database systems, including extended-relational, object-oriented, text, spatial, temporal, and heterogeneous databases. Currently, two such data mining systems, GeoMiner and WebMiner, for mining knowledge in spatial databases and the Internet information-base respectively, are being under design and construction.
Methodology [2,3]
We have developed a list of 14 criteria for evaluating DBMiner. These criteria can be put into four categories: Capability, Learnability/Usability, Interoperability, and Flexibility. Capability measures what a desktop tool can do, and how well it does it; Learnability/Usability means how easy a tool is to learn and use; Interoperability means a tool’s ability to interface with other computer applications; and Flexibility is the ease with which one can alter critical guiding parameters, or create a customized environment.
Results [2,3]
We have used FoodMart database, which comes with MSSQL server for testing. The FoodMart is made up of two cubes, Sales and Warehouse. The sales cube consists of 13 dimensions such as "Customers", "Educational Level", etc. and 7 fact tables (measurements) such as “Profit”, “Sales Average”, “Sales Count”, etc. Warehouse cube consists of 7 dimensions and 7 measurements. The database is loaded with enough data sufficient for our evaluations.
We have used DBMiner on Pentium 166 MHZ with 64 MB RAM, running Windows 2000.
Table 1 summarizes the results.
Capability
The criteria for capability we have selected are whether it is scalable to larger databases, has programming language for automation, provides useful output reports, and if it has visualization capabilities.
Given the training set of data we found that the scalability factor of the software was efficient. Furthermore, The software does not use any programming language for automation, however it has many wizards, which guides the end-user to get the tasks done. The software uses DMQL (Data Mining Query Language) for its own task, however the user is not able to manipulate the DMQL.
The visualization part of the software uses many graphics including ball graph, ball chart, grid, and frequent item sets for visualizing Association, Classification, and Clustering, however pie charts and correlation plots were missing. In addition, tree browsing was in graph view, which was confusing. There is another part of visualization, OLAP browser, which uses MS Excel 2000 visualization capabilities. The OLAP browser depends on MS Excel 2000 without which the OLAP browser is unable to function.
DBMiner shows the statistics report, which it calls it “mining results statistics.” This statistic mentions the number of items identified; however, it does not mention the characteristics of the results as well as analyzing the statistical results. In addition, we were not able to print any of results from Associations, Classifications, and Clustering, as well as the statistics results, the page was blank!
Learnability/Usability
There are six criteria for this category, namely tutorials, wizards, easy to learn, user’s manual, online help, and interface.
DBMiner is not a complex program for people familiar with data mining. However, if you are new to data mining the software does not include a tutorial to walk you through with an example.
Wizards are built in for automating the tasks of data mining. The wizards let the user select appropriate options for the tasks.
The user interface is very simple and standard. The menus are appropriate and so are the tool bars. However, we found that some tool bars did not perform very well when enabled, such as the tools in the visualization pane and the magnifier. In addition, some of the commands under menus do not have any function associated with them, such as the “Export” command under the file menu.
The user’s manual is well constructed for a user to find appropriate way to explore, however the style of the user’s manual is old, not web fashioned. Furthermore, The user’s manual does not contain links to other relevant topics. In addition, DBMiner has an average on line help. Overall, we think the software is easy to learn and interact with given the user’s knowledge of data mining techniques.
Interoperability
We use three criteria for this category: importing data, exporting data, and whether it has links to other applications.
DBMiner does not support importing and exporting of data. However, it communicates with MS OLAP Server and has MS Excel 2000 embedded as a visualization tool for OLAP browsing.
Flexibility
Two criteria can be defined to explain the flexibility of the application namely if the work environment is customizable and whether it is possible to write or change the code.
DBMiner uses DBQL for its internal functionality, however it is not possible to change or write DBQL.
DBMiner has the flexibility to let the user change the values of settings after each task is done. For example, it is possible to increase/decrease the support threshold or the confidence threshold if the user is not happy with the current level.
Other Limitations
DBMiner depends only on MS SQL Server as its back-end and uses MS Excel 2000 as its visualization tool for OLAP browsing. Other unavailable functional modules are data dispersion module, time-serial analysis module, and prediction module.
Conclusions
DBMineris a good data-mining tool as it reflects a user-friendly environment for users of all category. The discussion above about the software substantiates our evaluation about the software though there is a wide scope of improvement for the commercial version.
References
[1] Copied and pasted from “DBMiner: A data mining tool for large relational databases,”
[2] Bhavani Thuraisingham, Data Mining: technologies, techniques, tools, and trends, CRC press, 1999
[3]John F. Elder IV & Dean W. Abbott Elder Research, A Comparison of Leading Data Mining Tools, 1998
Appendix A
Table 1: Capability, Learnability/Usability, Interoperability, and Flexibility
Excellent
/Good
/Average
/Needs Improvement
/Poor
/Does Not Exist
Scalability
/
Has programming language
/
Provides useful output reports
/
Visualization
/
Wizards
/
Easy to learn
/
User’s manual
/
Online help
/
Interface
/
Importing data
/
Exporting data
/
Links to other applications
/
Customizable work environment
/
Ability to write or change codes
/
Overall
/