CHAPTER 1 (HAN and KAMBER)

CHAPTER 1 (HAN and KAMBER)

This chapter is a good introduction to the field of knowledge discovery in databases/data mining(KDD/DM). Many terms are introduced, key data mining functionalities are briefly described and many application oriented concepts are discussed.The emphasis (in this book) is on database perspective of data mining.

At first reading, it is adequate to emphasize Sections 1.1, 1.2, 1.4, 1.7 and 1.9

Section 1.1 Motivation and Importance

Large volumes of data are available and it is expected that this data contains useful information that can be helpful..
DM is a natural evolution of the database technology. The technology has evolved through data collection and management, advanced data analysis, on-line transaction processing, www, on-line analytical processing, and now data mining.

Section 1.2 What is Data Mining

Extracting or mining knowledge from large data bases.
Main steps: data cleaning; data integration; data selection; data mining; pattern evaluation; knowledge presentation.
Data mining system components: Database, data warehouse, www, or other data repository; database or data warehouse sever; knowledge base; data mining engine; pattern evaluation module; user interface.
Data mining involves an integration of techniques from database/datawarehouse technology, statistics, machine learning, high-performance computing, pattern recognition, and neural networks.
Emphasis is on efficient (in terms of time and storage) and scalable (running time linear with data size) data mining techniques.

Section 1.3 Different Types of Databases

Relational
Data warehouses
Transactional databases
Advanced data and information systems

Object-oriented and object-relational databases

Temporal, sequence, time-series databases

Spatial and spatiotemporal databases

Text and multimedia databases

Heterogeneous and legacy databases

Data stream databases

World wide web

Section 1.4 Data Mining Functionalities

Concept description: characterization and discrimination
Mining frequent patterns, associations, and correlations
Classification
Prediction
Cluster analysis
Outlier analysis
Evolution analysis

Section 1.5 Interestingness of Discovered Patterns

Techniques for evaluating the interestingness of discovered patterns or knowledge.

Section 1.6 Classification of Data Mining Systems

Kinds of databases
Kinds of knowledge discovered/mined.
Kinds of techniques employed
According to application domain

Section 1.7 Data Mining Task Primitives

Task-relevant data
Kind of knowledge to be mined
Background knowledge
Interestingness measures
Expected visualization representation

Section 1.8 Integration of Data Mining and Data Base Systems

No coupling, loose coupling, semi-tight coupling, tightcoupling.

Section 1.9 Major Issues in Data Mining

Mining methodology and user interaction issues: different kinds of knowledge in databases, interactive mining at multiple levels of abstraction, incorporation of background knowledge, data mining query languages and ad hoc data mining, presentation and visualization of results, handling noisy and incomplete data, interestingness problem.
Performance issues: efficiency and scalability of datamining algorithms, parallel, distributed and incremental algorithms.
Database diversity issues: handling of relational and complex data, heterogeneous databases and global information systems.