EVALUATION OF IBM INTELLIGENT MINER FOR DATA:

By Anand Kadur & Wangdong Zhang

Introduction:

IBM DB2 Intelligent Miner for Data embodies IBM’s latest data mining technology with capabilities to support the full range of mining processes, from data analysis and preparation tasks through mining and assimilation of results. Version 6.1,available September 24, 1999, includes a powerful new Associations Visualizer, extended data exploration capabilities, support for DB2 UDB V6.1, and a number of productivity enhancements in the areas of interoperability and data management. Performance enhancements include parallel processing for value Prediction and the introduction of parallel data mining for SMP and Cluster configurations on AIX, Sun Solaris, and Windows NT.

Highlights of the Software:

· Single framework for data mining - a suite of tools to support the iterative process, offering data processing, statistical analysis, and results visualization to complement a variety of mining methods.

· Proven mining algorithms that can be used individually or in combination to address a wide range of business problems and deliver measurable business results.

· Scalable solution focused on the technical issues of large-scale mining, such as large volumes of data, parallel data mining on AIX, Windows NT, Sun Solaris, and OS/390, directly mining DB2* data, long-running mining operations, and optimization of mining algorithms.

· Core technology for IBM data mining solutions, supported by industry-recognized mining consultants deployed worldwide, with customer engagements in finance, telecommunication, insurance, and health care.

· Application programming interface, enabling development of customized, industry-specific mining applications by customers, IBM, and IBM Business Partners.

Discovering information that leads to knowledge

Mining is the process of extracting valid, previously unknown, and ultimately comprehensible information from large databases and using it to make crucial business decisions. It is quickly being recognized as an essential business intelligence tool, a necessary ingredient to discovering the information necessary to improve a company's market presence and differentiate its products and services in today's global marketplace.

Intelligent Miner for Data helps knowledge workers identify and extract high value business intelligence from their data assets. It provides the fundamental technology and tools to support the mining process, as well as application services to support development of customized mining applications.

Intelligent Miner for Data is applicable to a wide range of business problems. Its mining results can facilitate decision making in business areas:

· Campaign planning

· Customer relationship management

· Process reengineering

· Product planning and fulfillment

· Fraud or abuse detection

Version 2

In its initial release, Intelligent Miner was established as a scalable, integrated framework capable of handling the mining of large quantities of data. Its functions cover the full range of mining processes, from data analysis and preparation tasks through mining and assimilation of results.

Version 2 features statistical functions, optimized mining techniques, usability and productivity enhancements, DB2* and DB2 Universal Database* exploitation, more parallel mining, and additional deployment options.

Innovative technology

Based on IBM research and validated through joint customer studies, nine innovative data mining algorithms have emerged as the critical suite to address a wide range of business problems. In customer engagements worldwide, these algorithms, often used in combination, indeed proved their versatility. They have been used by retailers to determine customer purchasing patterns; bankers to perform risk assessment; healthcare providers to detect potential fraud; and telephone companies to analyze customer attrition, to name a few.

Data mining algorithms

The algorithms are categorized as follows:

1. Association discovery

2. Sequential pattern discovery

3. Clustering

4. Classification

5. Value prediction

6. Similar time sequences

The alternatives afforded by this breadth of coverage are further enhanced by the fact that three of these categories are supported by more than one mining algorithm.

Data analysis and preparation

Automation of some of the most typical data preparation tasks is aimed at improving the analyst's productivity by eliminating the need for programming specialized routines. Depending on the data mining technique, analysts may select, sample, aggregate, filter, cleanse, and/or transform data in preparation for mining.

Statistical functions facilitate the analysis and preparation of data, as well as provide forecasting capabilities. Statistical functions include factor analysis, linear regression, principal component analysis, univariate curve fitting, logistic regression, and univariate and bivariate statistics.

Visualization

Visualization functions bring out unusual features that might otherwise be "drowned out." A range of visualizers is provided, specialized to the type of data mining or statistical analyses results.

Administrative user interface

An administrative user interface based on Java™ provides interactive access to mining tasks with ease of use and consistency across all operations. Implementation features the use of state-of-the-art GUI facilities, including online help, task guides, and a graphical representation of the mining base and its objects.

Repeatable sequences allow an Intelligent Miner for Data user to construct a sequence of mining operations which can be saved and subsequently modified and repeated. Analysts can develop an end-to-end mining sequence on one system and deliver (or port) it to a client system for execution.

Application program interface

As mining enters the mainstream of business processes, increasing numbers of customized applications to present and deploy mining results are being developed by customers, IBM, and IBM partners. IBM Discovery Series is one such application implemented using the Intelligent Miner for Data. It provides a suite of customer relationship management applications tailored to particular industries, such as telecommunications, utilities, and finance.

Industry-specific mining application offerings leverage the benefits of mining by raising the access and use of mining results to particular users in the enterprise. Results can thus be easily understood in the context of a business task.

Tool interface

A registration facility is provided to facilitate export of mining results to familiar analysis tools.

Pulling it all together

IBM provides complete customer data mining solutions. In addition to mining tools and applications, an IBM engagement can involve hardware and/or software offerings to support, build, and manage the necessary infrastructure, including database, data warehouse, and other business intelligence offerings.

IBM consulting practices are organized to bring industry and data mining experience as well as overall warehouse and DB2 skills to a mining project, complementing customer resources. They are available to support all phases of data mining solution development, from inception through

deployment. Mining engagements can encompass planning and installation of complete mining solutions, as well as mining education and consulting.

Plus points noticed:

Single framework for data mining--a suite of tools to support the iterative process, offering data processing, statistical analysis, and results visualization to complement a variety of mining methods.

Proven mining algorithms that can be used individually or in combination to address a wide range of business problems and deliver measurable business results.

Scaleable solution focused on the technical issues of large scale mining, such as large volumes of data, AIX*/SP* parallel processing, mining directly against DB2 Universal Database data, long-running mining operations, and optimization of mining algorithms.

Core technology for IBM Data Mining Solutions, supported by industry recognized mining consultants deployed worldwide, with customer engagements in finance, telecommunication, insurance, healthcare, and other industries.

Application Programming Interface enabling development of customized, industry-specific mining applications by customers, IBM, and IBM business partners.

Version 2 highlights

New and Improved Analytics

· Statistics

· Neural Net Value Prediction

· Optimization of algorithms

· Model quality graphics

Usability Enhancements

· Java™ User Interface

· Repeatable Sequence

· Task Guides, Expert Use Mode

· Progress Indicator

DB2 Universal Database Exploitation

· Parallel data mining

· Performance

More Parallel Mining

· Increased parallelism of algorithms

· Full exploitation of DB2 Universal Database Enterprise-Extended Edition

Additional Deployment Options

· More servers--AIX, AIX/SP, OS/390*, Solaris, OS/400*, Windows NT

· More clients--AIX, Windows NT/Windows 95, OS/2*

· More languages--Brazilian, French, Hungarian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Traditional Chinese

· More program access--Server API, Client API

· More portable mining bases

Additional evaluation on different criteria:

We try to evaluate this software on the following criteria extending the above evaluation:

· Support for platforms.

· Functionality and variety of techniques for mining.

· Efficiency of visual interfaces for the results.

· User friendliness.

· Plus points and Critics.

To start with, we would like a software to be compatible with a variety of platforms that are available to us today. IBM intelligent miner for data has support for most of the platforms including WIN NT, AIX, Solaris, OS/390,400 for the server part and AIX, OS/2 , WIN NT and WIN 95 as far as the client part is concerned.

It requires about 450 MB of hard disk space including the DB2 database which is a pre-requisite for it to run.

The server can be configured to startup as the computer boots up avoiding manual startup each time.

The disadvantage on the server part is that the user has to be of the administrator group to manipulate on the server settings.

When we discuss the functionality of the software we discuss the working of the client side of the software because this is where most of the development has to occur. The server will be running in the background supporting the client.

The latest release of this software that is version 6.1 has a whole lot of mining algorithms like Association discovery, Pattern discovery, clustering, classification, value prediction etc. and

these are the techniques that are needed in data mining and we have a large variety to choose from.

Options are provided to either use an existing mining base or data source either a flat file or relational database. We can import data from external sources too. Mining can be performed on these data sources using one of the algorithms and the results can be viewed using one of the many interfaces that the software provides.

The software provides over a dozen interfaces to view the results and this is one of the top points of the software.

We can interpret the results using the appropriate method that suits the application.

Several snap shots will be shown in the presentation.

Lastly, when we discuss user friendliness of the software, we tend to evaluate the software from a naïve user point of view.

Great demos are available as far as using the software is concerned.

It has good GUI screens but for a new user, certain assumptions are made such as: The user has to be familiar with DB2, the user must be an administrator on NT machines.

Further more, it does not provide a client version for Solaris systems.

So finally concluding, we found that the software in general is really good because there are a number of case studies that show how efficiently the product was used for various applications of commercial use.