CS 539 Spring 2007

Computer Science Department

CS539 Machine Learning - Spring 2007
Project 9 - Final Project

PROF. CAROLINA RUIZ

Due Date: Thursday, April 19th 2007. Slides are due at 3:00 pm and the written report is due at 4:00 pm.

This final assignment consists of two parts:

Work further on your weakest (or more precisely, your least-strong) project so that you turn it into your best one. Take advantage of the feedback that you received from me and/or from your classmates during that project presentation and on your written report. (Each one of you and I have agreed upon which of the projects is your weakest project.)

Complete the following table summarizing each and everyone of your projects. Pick one of the datasets you used throughout the semester and re-run experiments as necessary so that you can report results using the same evaluation approach (if at all possible 10-fold cross-validation, if not 4-fold cross-validation), the same training and testing datasets, etc. The experimenter in the Weka system would be very helpful for this (see the experimenter tutorial included in the Weka package). Also, use the experimenter to determine whether or not the accuracy differences between pairs of these methods are statistically significant with a p value of 0.05 or less. Please include this table in your report and in your slides.

Technique DecisionTrees ID3/J4.8 NeuralNetworks NaiveBayes/BayesNets Instance-Based IB1/IBk/LBR/LWR GeneticAlgorithms RuleLearning Prism/Foil Technique re-done for final project: ________

Code (mine/other/adapted)

Programming Language

Dataset (name):

Accuracy

Stat. significantly better than: (list methods)

Size of the model

How readable is the model?

Number of attributes used

Num. of training instances

Num. of test instances

Missing values included?(y/n)

Pre-processing done

Evaluation method used
(n-fold cross val, n=?)

Training Time

Testing Time

REPORT AND DUE DATE

Written Report.
Your report should contain the following sections with the corresponding discussions:
1. The required sections of the report that you're re-doing in this project, following the specifications of that project (e.g. decision trees, neural networks, etc.)
2. Table Summarizing of Results of all the Course Projects
3. Detail description and analysis of your table.
  - For those methods (e.g. decision trees) for which two or more particular algorithms are listed on the table (e.g. ID3 and J4.8), provide the required information for each of the algorithms listed, in the order they are listed, separated by "/"s (e.g. accuracy 78% / 81%, if the accuracy of your best ID3 decision tree was 78% and the accuracy of your best J4.8 decision tree was 81% on the dataset analyzed).
  - Discuss the strengths and the weaknesses of each of those machine leaning methods over the datasets used for the projects.
Oral Report. We will discuss the results from the individual projects in class on April 19th and 24th. Your oral report should summarize the different sections of your written report as described above. Each of you will have 10 minutes to explain your results and to discuss your project in class. Be prepared!
Submission and Due Date.
1. Please submit the following file by email by Thursday April 19th at 3:00 pm.
  [your-lastname]_proj9_slides.[ext] containing your slides for your oral report. This file should be either a PDF file (ext=pdf) or a PowerPoint file (ext=ppt). Please use only lower case letters in the name file. For instance my file would be named ruiz_proj9_slides.ppt
2. Please bring a hardcopy of your report to class on April 19th, 2007. This written report is due at 4:00 pm that day. Also, submit your old report with my comments on the project you re-did.

Technique	DecisionTrees ID3/J4.8	NeuralNetworks	NaiveBayes/BayesNets	Instance-Based IB1/IBk/LBR/LWR	GeneticAlgorithms	RuleLearning Prism/Foil	Technique re-done for final project: ________
Code (mine/other/adapted)
Programming Language
Dataset (name):
Accuracy
Stat. significantly better than: (list methods)
Size of the model
How readable is the model?
Number of attributes used
Num. of training instances
Num. of test instances
Missing values included?(y/n)
Pre-processing done
Evaluation method used (n-fold cross val, n=?)
Training Time
Testing Time

CS539 Machine Learning - Spring 2007 Project 9 - Final Project

PROF. CAROLINA RUIZ

REPORT AND DUE DATE

CS539 Machine Learning - Spring 2007
Project 9 - Final Project