CHAPTER 1 (HAN and KAMBER)
This chapter is a good introduction to the field of knowledge discovery in databases/data mining(KDD/DM). Many terms are introduced, key data mining functionalities are briefly described and many application oriented concepts are discussed.The emphasis (in this book) is on database perspective of data mining.
At first reading, it is adequate to emphasize Sections 1.1, 1.2, 1.4, 1.7 and 1.9
Section 1.1 Motivation and Importance
- Large volumes of data are available and it is expected that this data contains useful information that can be helpful..
- DM is a natural evolution of the database technology. The technology has evolved through data collection and management, advanced data analysis, on-line transaction processing, www, on-line analytical processing, and now data mining.
Section 1.2 What is Data Mining
- Extracting or mining knowledge from large data bases.
- Main steps: data cleaning; data integration; data selection; data mining; pattern evaluation; knowledge presentation.
- Data mining system components: Database, data warehouse, www, or other data repository; database or data warehouse sever; knowledge base; data mining engine; pattern evaluation module; user interface.
- Data mining involves an integration of techniques from database/datawarehouse technology, statistics, machine learning, high-performance computing, pattern recognition, and neural networks.
- Emphasis is on efficient (in terms of time and storage) and scalable (running time linear with data size) data mining techniques.
Section 1.3 Different Types of Databases
- Relational
- Data warehouses
- Transactional databases
- Advanced data and information systems
Object-oriented and object-relational databases
Temporal, sequence, time-series databases
Spatial and spatiotemporal databases
Text and multimedia databases
Heterogeneous and legacy databases
Data stream databases
World wide web
Section 1.4 Data Mining Functionalities
- Concept description: characterization and discrimination
- Mining frequent patterns, associations, and correlations
- Classification
- Prediction
- Cluster analysis
- Outlier analysis
- Evolution analysis
Section 1.5 Interestingness of Discovered Patterns
Techniques for evaluating the interestingness of discovered patterns or knowledge.
Section 1.6 Classification of Data Mining Systems
- Kinds of databases
- Kinds of knowledge discovered/mined.
- Kinds of techniques employed
- According to application domain
Section 1.7 Data Mining Task Primitives
- Task-relevant data
- Kind of knowledge to be mined
- Background knowledge
- Interestingness measures
- Expected visualization representation
Section 1.8 Integration of Data Mining and Data Base Systems
No coupling, loose coupling, semi-tight coupling, tightcoupling.
Section 1.9 Major Issues in Data Mining
- Mining methodology and user interaction issues: different kinds of knowledge in databases, interactive mining at multiple levels of abstraction, incorporation of background knowledge, data mining query languages and ad hoc data mining, presentation and visualization of results, handling noisy and incomplete data, interestingness problem.
- Performance issues: efficiency and scalability of datamining algorithms, parallel, distributed and incremental algorithms.
- Database diversity issues: handling of relational and complex data, heterogeneous databases and global information systems.