CIS600/CSE 690: Applied Data Mining
CLASS SCHEDULE AND COVERAGE
(1) Material covered on August 31 and September 2 (two meetings)
August 31 (M): Introduction to KDD and DM; DM phases; CRISP model; four functionalities.
September 2(W): Two meetings, examples of four functionalities (Classification; prediction; association rules; clustering); brief description of SKICAT, cancer classification, software engineering; classification problem; development and use of classification models; decision tree; Quinlan’s algorithm using information gain; Example 6.1 (Han’s book) to discuss decision tree construction.
September 9(W): (two meetings). Review classification; stopping rules for tree construction, classification error; tree pruning; training, validation and generalization errors; model selection and assessment.
Case studies: SKICAT; cancer classification; diabetes; software module criticality;
September 10 (Thursday, optional): Review session 4:30-5:30pm
September 14(M): Association rules; itemsets and frequent itemsets; a-priori property; support and confidence of rules
September 23(W): Quiz NO.1; Rapid miner presentation and demonstration.
September 30(W): Project: classification and association rules using (i) diabetes data (ii) heart disease data. Rapid miner project report due in class on Oct 14.
October 5(M): Introduction to clustering.
October 7(W): Clustering for diabetes and heart disease data using rapid miner (optional class).
October 12,14: Rapid miner project report due in class on Oct 14; Introduction to prediction modeling using regression analysis; course review;
October 15: Course review. Optional, time: TBA.
October 16(F): Examination No.1, 5-7:30pm , room:TBA
October 19,21: Neural networks for classification and prediction; case studies
October 26,28: Radial basis functions; case studies from “Lessons Learned”.
October 29: Optional review class
The above represents about 9-10 weeks of class meetings; coverage for other classes to be finalized later.
NOTE: Tentative: Quiz No.2 November 3(M); Exam No.2 November 13,2009 (F)
CIS600/CSE 690: Analytical Data Mining
Quiz No.1 September 23, 2009(Wednesday) about 3:50 – 4:20pm
Coverage:
PartA: KDD/DM (similar to assignment No.1) (15%)
1. KDD/DM description, goals of DM.
2. CRISP model; description of each phase.
3. Description of the four functionalities in DM; classification; prediction; association rules; clustering.
PartB: Classification (similar to assignment No.2) (50%)
1. Classification, 2-steps of development and use of classification model
2. Decision tree introduction from training data;
3. Quinlan’s algorithm using information gain criterion.
4. Develop a tree for given data set.
5. Interpretation of decision tree.
6. Classification error
7. Tree pruning; pros and cons.
PartC: Some basic concepts. (15%)
1. Supervised and unsupervised learning.
2. Training, validation and test data
3. Theoretical behavior of training, validation and test errors versus model complexity.
4. Model selection and assessment.
PartD: Case studies (20%)
1. Importance of SKICAT application; significance of data mining results; contribution to science.
2. Diabetes classification problem; problem description, approach for classification (no details of radial basis functions); interpretation of classification results.
3. Micro array data classification for cancer type; problem description; goal of study; interpretation and significance of results.