COSC 6342 2014 Midterm Exam Review List

Dr. Eick

The midterm exam has been scheduled for Mo. March 17, 2:30-3:50pin our class room. The exam is open book and notes—you are not allowed to use any computers!

Relevant Material:All transparencies covered in the lectures (except transparencies associated with Topic4 and Topic 9 that were skipped), Shlens PCA article and Kaebling articleand the following pages of the textbook (second edition) are relevant for the 2014 midterm exam:

1-14, 21-28, 30-42, 47-55, 61-73, 76-84, 87-93, 108-120, 125-128, 143-154, 163-172, 174-181 (except line smoother), 186-188, 192-196, some Wikipedia pages referenced by transparencies.

Checklist:

What is machine learning?

Reinforcement Learning (Assumptions and goals, relationship to supervisded learning, Bellman Equations and update, role of policies, exploration vs. exploitation, TD-difference and Q-learning)

hypothesis class, VC-dimension, basic regression, overfitting, underfitting, training set, test set, cross-validation, model complexity, triple trade-off (Dieterich), performance measure/loss function.

Bayes’ Theorem, Naïve Bayesian approach, losses and risks, derive optimal decision rules for a given cost/risk function.

Maximum likely hood estimation, variance and bias, noise, Bayes’ estimator and MAP, parametric classification, model selection procedures, multivariate Gaussian, covariance matrix, Malhalanobis distance, PCA (goals and objectives, what does it actually do, what is it used for, how many principal components do we choose?), multidimensional scaling (only what is it used for, and how is it different from PCA).

K-means (prototype-based/representative-based clustering, how does the algorithm work, optimization procedure, algorithm properties), EM (assumptions of the algorithm, mixture of Gaussians, how does it work, how is cluster membership estimated, how is the model updated from cluster membership, relationship to K-means).

Non-parametric density estimation (histogram, naïve estimator, Gaussian kernel estimator), non-parametric regression (Running mean and kernel smoother, how is it different from regular regression), k-nearest neighbor classification (transparencies only).Regression Trees (what they are used for, how they are different from different prediction approach and how are they generated from a dataset).

Transparencies and Other Teaching Material

Course Organization ML Spring 2014
Topic 1: Introduction to Machine Learning(Eick/Alpaydin Introduction, Tom Mitchell's Introduction to ML---only slides 1-8 and 15-16 will be used)
Topic 2: Supervised Learning (examples of classification techniques: Decision Trees, k-NN)
Topic 3: Bayesian Decision Theory (excluding Belief Networks) and Naive Bayes (Eick on Naive Bayes)
Topic 4: Using Curve Fitting as an Example to Discuss Major Issues in ML (read Bishop Chapter1 in conjuction with this material; not covered in 2011)
Topic 5: Parametric Model Estimation
Topic 6: Dimensionality Reduction Centering on PCA (PCA Tutorial, Arindam Banerjee's More Formal Discussion of the Objectives of Dimensionality Reduction)
Topic 7: Clustering1: Mixture Models, K-Means and EM (Introduction to Clustering, Modified Alpaydin transparencies, Top 10 Data Mining Algorithms paper)
Topic 8: Non-Parametric Methods Centering on kNN and Density Estimation(kNN, Non-Parametric Density Estimation, Summary Non-Parametric Density Estimation, Editing and Condensing Techniques to Enhance kNN, Toussant's survey paper on editing, condesing and proximity graphs)
Topic 9: Clustering 2: Density-based Clustering (DBSCAN paper, DENCLUE2 paper)
Topic 10: Decision Trees(only material related to regression trees)

Topic 18: Reinforcement Learning (Alpaydin on RL (not used), Eick on RL---try to understand those transparencies; Using Reinforcement Learning for Robot Soccer, Sutton "Ideas of RL" Video (to be shown and discussed in part in 2013), Kaelbling's RL Survey Article---read sections 1, 2, 3, 4.1, 4.2, 8.1 and 9 centering on what was discussed in the lecture)