WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS539 Machine Learning - Spring 2005 
Project 6 - Instance-Based Learning and Regression Methods

PROF. CAROLINA RUIZ 

Due Date: Thursday, March 24 2005. Slides are due at 3:00 pm and the written report is due at 4:00 pm. 
------------------------------------------


PROJECT DESCRIPTION

Use Instance-based Learning and Regression techniques to construct classifiers for each of the following problems:

  1. Predicting the class attribute (or when required by the learning method, a numeric attribute of your choice) in the Covertype data available at the UCI Machine Learning Repository.

  2. Predicting a numeric attribute of your choice in the census-income dataset.

PROJECT ASSIGNMENT

  1. Read Chapter 8 of the textbook about Instance-based Learning in great detail.

  2. Read the code of the Instance-based Learning and Regression techiques implemented in the Weka system. Some of those techniques are enumerated below:

    • Instance-based Learning:
      • IB1: nearest neighbor classification
      • IBk: k-nearest neighbors classification

    • Other Lazy Learning:
      • LBR: Lazy Bayesian Rules Classifier

    • Regression:
      • Linear Regression
      • LWR: Locally Weighted Regression [In order to run locally weighted linear regression using Weka, use LWL (locally weighted learning) from the Weka's lazy classifiers, and select "Linear Regression" for the LWL's classifier option]

  3. The following are guidelines for the construction of your models:

    • Code: Use the above listed techniques implemented in the Weka system. If you prefer, implement your own code.

    • Training and Testing Instances: You may restrict your experiments to a subset of the instances IF Weka cannot handle your whole dataset. But remember that usually THE MORE TRAINING DATA THAT YOU CAN USE, THE BETTER. FOR TESTING, YOU MUST USE AT LEAST 100 DATA INSTANCES THAT WERE NOT USED FOR TRAINING.

  • Preprocessing of the Data: You should apply relevant filters to your dataset as needed before doing the mining and/or using the results of previous mining tasks. For instance, you may decide to remove apparently irrelevant attributes, replace missing values if any, discretize attributes in a different way, etc. Your report should contained a detailed description of the preprocessing of your dataset and justifications of the steps you followed. If Weka does not provide the functionality you need to preprocess your data as you need to obtain useful patterns, preprocess the data yourself either by writing the necessary filters (you can incorporate them in Weka if you wish).

  • Evaluation and Testing: As usual, experiment with different testing methods, including n-fold cross-validation and percentage split.

    REPORT AND DUE DATE