CIS600/CSE 690: Applied Data Mining

CLASS SCHEDULE AND COVERAGE

(1)  Material covered on August 31 and September 2 (two meetings)

August 31 (M): Introduction to KDD and DM; DM phases; CRISP model; four functionalities.

September 2(W): Two meetings, examples of four functionalities (Classification; prediction; association rules; clustering); brief description of SKICAT, cancer classification, software engineering; classification problem; development and use of classification models; decision tree; Quinlan’s algorithm using information gain; Example 6.1 (Han’s book) to discuss decision tree construction.

September 9(W): (two meetings). Review classification; stopping rules for tree construction, classification error; tree pruning; training, validation and generalization errors; model selection and assessment.

Case studies: SKICAT; cancer classification; diabetes; software module criticality;

September 10 (Thursday, optional): Review session 4:30-5:30pm

September 14(M): Association rules; itemsets and frequent itemsets; a-priori property; support and confidence of rules

September 23(W): Quiz NO.1; Rapid miner presentation and demonstration.

September 30(W): Project: classification and association rules using (i) diabetes data (ii) heart disease data. Rapid miner project report due in class on Oct 14.

October 5(M): Introduction to clustering.

October 7(W): Clustering for diabetes and heart disease data using rapid miner (optional class).

October 12,14: Rapid miner project report due in class on Oct 14; Introduction to prediction modeling using regression analysis; course review;

October 15: Course review. Optional, time: TBA.

October 16(F): Examination No.1, 5-7:30pm , room:TBA

October 19,21: Neural networks for classification and prediction; case studies

October 26,28: Radial basis functions; case studies from “Lessons Learned”.

October 29: Optional review class

The above represents about 9-10 weeks of class meetings; coverage for other classes to be finalized later.

NOTE: Tentative: Quiz No.2 November 3(M); Exam No.2 November 13,2009 (F)

CIS600/CSE 690: Analytical Data Mining

Quiz No.1 September 23, 2009(Wednesday) about 3:50 – 4:20pm

Coverage:

PartA: KDD/DM (similar to assignment No.1) (15%)

1.  KDD/DM description, goals of DM.

2.  CRISP model; description of each phase.

3.  Description of the four functionalities in DM; classification; prediction; association rules; clustering.

PartB: Classification (similar to assignment No.2) (50%)

1.  Classification, 2-steps of development and use of classification model

2.  Decision tree introduction from training data;

3.  Quinlan’s algorithm using information gain criterion.

4.  Develop a tree for given data set.

5.  Interpretation of decision tree.

6.  Classification error

7.  Tree pruning; pros and cons.

PartC: Some basic concepts. (15%)

1.  Supervised and unsupervised learning.

2.  Training, validation and test data

3.  Theoretical behavior of training, validation and test errors versus model complexity.

4.  Model selection and assessment.

PartD: Case studies (20%)

1.  Importance of SKICAT application; significance of data mining results; contribution to science.

2.  Diabetes classification problem; problem description, approach for classification (no details of radial basis functions); interpretation of classification results.

3.  Micro array data classification for cancer type; problem description; goal of study; interpretation and significance of results.