WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS539 Machine Learning 
Assignment Chapter 6 - Fall 2000

PROF. CAROLINA RUIZ 

Due:
First Part: Thursday, October 19, 2000 at 6:00 pm. 
Second Part: Thursday, October 26, 2000 at 6:00 pm. 

------------------------------------------


PROJECT DESCRIPTION

Construct the most accurate naive Bayes classifier you can for predicting whether the income of a given person is >50K or <= 50K using the
census-income dataset from the US Census Bureau which is available at the Univ. of California Irvine Repository.

I have downloaded the dataset into the following directory: /cs/courses/cs539/f00/Projects/Census_Income_Data
You can access the dataset from there.

The census-income dataset contains census information for 48,842 people. It has 14 attributes for each person (age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, and native-country) and a boolean attribute class classifying the input of the person as belonging to one of two categories >50K, <=50K.


PROJECT ASSIGNMENT

This project consists of two parts:
Part 1: Due October 19 at 6:00 pm.
STUDY the
C code for the naive Bayes classifier (Rainbow) provided with Chapter 6 of the textbook. Adapt the code to the Census-income data as needed. Run preliminary experiments with this code over the dataset. Be ready to discuss with your classmates the code as well as the results of your experiments.
Part 2: Due October 26 at 6:00 pm.
Construct, train, and test the most accurate naive Bayes classifier you can to predict the Salary attribute of the Census-Income data. The following are guidelines to construct and train your naive Bayes classifier:

REPORT AND DUE DATE