WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS539 Machine Learning - Spring 2007 
Project 5 - Bayesian Learning

PROF. CAROLINA RUIZ 

Due Date: Tuesday March 13, 2007. Slides are due at 3:00 pm and the written report is due at 4:00 pm. 
------------------------------------------


PROJECT DESCRIPTION

Experiment with Naive Bayes and Bayesian Net classifiers for each of the following problems:

  1. Predicting the class attribute in the Covertype data available at the UCI Machine Learning Repository.

  2. Predicting whether the income of a given person is >50K or <= 50K using the census-income dataset from the US Census Bureau which is available at the Univ. of California Irvine Repository.
    The census-income dataset contains census information for 48,842 people. It has 14 attributes for each person (age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, and native-country) and a boolean attribute class classifying the input of the person as belonging to one of two categories >50K, <=50K.

  3. A dataset of your choice. This dataset can consist of data that you use for your own research or work, a dataset taken from a public data repository (e.g., UCI Machine Learning Repository, or from the UCI KDD Archive), or data that you collect from public data sources. THIS DATASET SHOULD BE LARGE IN TERMS OF THE NUMBER OF INSTANCES AND ATTRIBUTES SO IT CANNOT BE ONE OF THOSE INCLUDED IN THE WEKA SYSTEM.

PROJECT ASSIGNMENT

  1. Read Sections 6.1, 6.2, 6.7, 6.8, 6.9, 6.10, 6.11, 6.12, 6.13 of your textbook in great detail.

  2. Solve Exercises 6.1, and 6.6 of your textbook (pages 198-199). Include your solution in your written report (and not in your oral report).

  3. Read the NaiveBayes and the BayesNets code in the Weka system.

  4. The following are guidelines for the construction of your Naive Bayes and Bayesian Net Classifiers:


REPORT AND DUE DATE