Applied Machine Learning

Applied Machine Learning

Machine Learning in Practice/ Applied Machine Learning

11-344,05-834/05-434

Instructor: Dr. Carolyn P. Rosé,

Course Cross-listed in: HCII, LTI

Units: 12

Books:

Witten, I. H. & Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques, third edition, Elsevier: San Francisco, ISBN 978-0-12-374856-0

Prerequisites: Some Java programming experience is desirable, but not necessary.

Course Description:

Machine Learning is concerned with computer programs that enable the behavior of a computer to be learned from examples or experience rather than dictated through rules written by hand. It has practical value in many application areas of computer science such as on-line communities and digital libraries. This class is meant to teach the practical side of machine learning for applications, such as mining newsgroup data or building adaptive user interfaces. The emphasis will be on learning the process of applying machine learning effectively to a variety of problems rather than emphasizing an understanding of the theory behind what makes machine learning work. This course does not assume any prior exposure to machine learning theory or practice.

We will cover a wide range of learning algorithms that can be applied to a variety of problems. In particular, we will cover topics such as decision trees, rule based classification, support vector machines, Bayesian networks, and clustering. In addition to readings from the course textbook, we will have additional readings from research articles that will be announced ahead of time and distributed on the course Drupal.

Grades (For CMU students only):

Grades will be based on weekly assignments and quizzes, 2 midterms, and a course project.

Course Procedures:

Once the semester formally starts, we will set up a couple of regular times for weekly on-line discussion sections where you can bring any questions or issues you would like to discuss. In addition to this, you can feel free to send email with questions or a request for an in-person or on-line meeting. All correspondence for the course should be sent to the course email account: . In order to be able to participate in on-line chats for the course, please add to your contacts list in your google chat account. If you prefer not to use your personal google chat account for this purpose, you can create one just for this course.

All course materials as well as the course discussion board can be found on the course Drupal account at http://kanagawa.lti.cs.cmu.edu/amls09/. You will be receiving email with your login instructions.

Below you’ll see the material for the course divided into units of one to three “weeks” in duration. In the in-person version of this course, those “weeks” corresponded to actual weeks. However, since this is a self-paced on-line course, you can work at your own pace, and the “week” designations are merely to help you gage your progress.

You can find the Assignments and Quizzes linked to the Syllabus page in the course Drupal account. You will also find the links to the video lectures as well as links to the powerpoint slides for all of the video lectures there. There is a supplementary page where you can also get a slightly updated version of those slides that was used in the Fall 08 in person offering of the course at Carnegie Mellon University, which fixes a few typos and adds a little bit of supplementary material.

Below where the lectures are listed, you will see which reading assignments, quizzes, and assignments are associated with each unit. I suggest that for each unit, you do the readings, then take the quiz, then listen to the lectures, and then do the assignment. The quizzes are meant to help you gage your reading comprehension and to make certain ideas salient that will be key topics in the lectures. The lectures will make the material from the readings more clear, and will emphasize the topics from the readings that are most important from the standpoint of the course objectives. The assignments are meant to give you valuable practice at applying the principles covered in the readings and lectures.

Assignments will experiments and activities using the Weka toolkit ( and the TagHelper tools toolkit (http://www.cs.cmu.edu/~cprose/TagHelper.html). You will receive feedback on your assignments, which is designed to help you learn from your mistakes, but you will not be graded down for your mistakes. Instead you will just get credit for having done the assignments. Similarly, you will receive feedback on the quizzes, but you won’t get graded down for your mistakes. Both the assignments and the quizzes are meant more as learning activities than assessments. You will turn in all assignments to . You should label your submission with your name and the name of the assignment or quiz (e.g., week1-quiz.doc or assignment1.doc). After you turn in the assignment, you will be emailed feedback as well as the answer key.

Mid-terms will serve as formal assessments and will serve to measure your level of competence in connection with the course objectives. In order to take a mid-term, you must first complete all of the quizzes and assignments leading up to it in the syllabus. Then you should request the mid-term to be sent to you at a particular time. You will then have 24 hours from that time to turn in the completed mid-term. In addition to a grade, you will receive detailed feedback and an answer key as with the assignments.

The term project will involve applying machine learning to a substantial problem of the student’s choice. Several options will be found in the Projects subfolder on the course Drupal account. Students may select one of these projects or may propose one of their own design. Students who wish to design their own project should check in about their plans with the instructor as early as possible in the semester.

Grading Criteria

Quizzes (10%)

Assignments (20% total)

Mid-terms (10% each)

Course project (50%)

Course Schedule (Note: All of this is linked to the Syllabus page on Drupal)

Week 1 Course Intro/ Weka Intro (Witten & Frank, CH 1, 10-11)

Week 1 Quiz

Week 1 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1198082070)
Week 1 Lecture 2 (http://coursecast.cs.cmu.edu/view.pl?id=1189781557)

Assignment 1

Week 2 Input and Output (Witten & Frank, CH 2, 3.1, and 3.3, Kwiatkowska et al., 2005)

Week 2 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1189166828)
Week 2 Lecture 2 (http://coursecast.cs.cmu.edu/view.pl?id=1189172805)

Assignment 2

Week 3-4 Basic Statistical Models and Linear Models (Witten & Frank, Ch 4.2, 4.6)

Week 3 Quiz

Week 3 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1203463712)
Week 3 Lecture 2 (http://coursecast.cs.cmu.edu/view.pl?id=1203467603)
Assignment 3

Week 4 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1203470392)
Week 4 Lecture 2 (http://coursecast.cs.cmu.edu/view.pl?id=1190386065)

Course Project Proposals due after Week 3 Lecture 2

Week 5 Applied Machine Learning Process and Evaluation (Witten & Frank, CH 5, CH 13)

Week 5 Quiz

Week 5 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1190984504)
Week 5 Lecture 2 (http://coursecast.cs.cmu.edu/view.pl?id=1190990698)

Assignment 4

Week 6-7 Working with Text/TagHelper (Jackson & Moulinier, CH 1,3)

Week 6 Quiz

Week 6 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1191589878)
Week 6 Lecture 2 (http://coursecast.cs.cmu.edu/view.pl?id=1191594436)
Assignment 5

Week 7 Quiz

Week 7 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1192193679)
Week 7 Lecture 2 (http://coursecast.cs.cmu.edu/view.pl?id=1192197907)

Assignment 6

Week 8 Rule Representations and Basic Algorithms (Witten & Frank, CH 3.4-4)

Week 8 Quiz

Week 8 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1192798567)
Week 8 Lecture 2 (http://coursecast.cs.cmu.edu/view.pl?id=1192803437)

Take Mid-term 1 after Week 8 Lecture 2, no assignment this week

Week 9 Advanced Tree and Rule Based Learning (Witten & Frank, CH 6.1, 6.2, 6.6)

Week 9 Quiz

Week 9 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1193403997)
Week 9 Lecture 2 (http://coursecast.cs.cmu.edu/view.pl?id=1193408276)

Assignment 7

Week 10 Linear Models, Statistical Models, and Clustering (Witten & Frank, CH 6.4,6.5, 6.7, 6.8)

Week 10 Quiz

Week 10 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1194008827)
Week 10 Lecture 2 (http://coursecast.cs.cmu.edu/view.pl?id=1194014812)

Assignment 8

Week 11 Feature Selection and Optimization (Witten & Frank sections 7.1,7.2, 7.3, 7.5, and CH 8)

Week 11 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1194617855)
Week 11 Lecture 2 (http://coursecast.cs.cmu.edu/view.pl?id=1194622118)

Assignment 9

Week 12 Semi-Supervised Learning, Machine Learning Extensions (Witten & Frank Section 6.9 and CH 9)

Week 12 Quiz

Week 12 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1195222545)
Week 12 Lecture 2 (http://coursecast.cs.cmu.edu/view.pl?id=1195225490)

Take Mid-term 2 after Week 12 Lecture 1, no assignment this week

Week 13-15 More Machine Learning Applications

Week 13 Quiz
Week 13 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1196431133)
Week 14 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1196435819)
Week 14 Lecture 2 (http://coursecast.cs.cmu.edu/view.pl?id=1196438034)
Week 15 Lecture 1 (http://coursecast.cs.cmu.edu/view.pl?id=1197036386)
Week 15 Lecture 2 (http://coursecast.cs.cmu.edu/view.pl?id=1197036416)