Project 2: Classification

Start date Oct 23, due Nov 6 beginning of class.

The goal of this project is to choose and evaluate classification mechanisms. I would suggest using the mechanisms available in Weka or RapidMiner, although you may implement your own if you wish.

Use Adult, Iris, Zoo datasets from the UCI Machine Learning Repository. If you wish to use other datasets in place of these, please give me a pointer to or description of the datasets and I'll let you know if that is okay.

What you need to do for this project is:

  1. Compare the performances of the four classification algorithms: Decision tree, Naïve Bayes, KNN, SVM
  2. Explain which algorithm you would use for what types of data and why.

Project Report

The project report should contain the following:

  1. Discussion of each of the 3 experiment comparisons:
  2. Description of the data sets
  3. Parameters chosen for the algorithms, e.g., what’s the k in KNN etc.
  4. How you prepared the data
  5. Platform information: Description of the machine and system information used for the experiments
  6. Experimental result summary
  7. What method do you used for evaluating accuracy
  8. Record training time, test time for each method
  9. Compare their accuracy (in either table or graph format)
  10. Conclusions: General discussion of the appropriate conditions for use of each algorithm. You may instead want to frame this as a discussion of appropriate type of algorithm for a general category of data (probably a more difficult task, but also more interesting.)

You should also include the output from your sample runs.

Scoring

Scoring will be based on:

·  Experiment and discussion

·  Knowledge displayed in conclusions

Particularly good discussions of any of the experiments may result in extra points

Turning in the project

Both electronic submission and hard copy are required. Tar/zip your files and email to . Pdf is the safest for capturing non-text. You also need to turn in a hard copy. Please hand in at the beginning of class.