Project Assignment:
THOROUGHLY READ AND FOLLOW THE
PROJECT GUIDELINES.
These guidelines contain detailed information about how to structure your
project, and how to prepare your written and oral reports.
*** You must use the
Project 5 Template provided for your written report. ***
(if you prefer not to use Word, you can copy and paste this format in a
different editor as long as you respect the stated page structure and
page limit.)
The font size should be no smaller than 11pts.
Do not exceed the page limit.
- Machine Learning Technique(s):
Use the Naive Bayes and Bayesian Net classification
methods implemented in Weka and in R.
- Dataset(s):
In this project, we will use two datasets:
-
The
Flags dataset
available at the
UCI Machine Learning Repository.
Use religion as the target attribute.
-
The ReutersCorn dataset that comes with the Weka system.
Combine together ReutersCorn-train.arff and ReutersCorn-test.arff files
into a ReutersCorn.arff dataset.
This dataset is a collection of text documents.
For transforming this dataset from a text (unstructured) format
to a tabular (structured) format you
can write your own code; use Weka (see the StringToWordVector
filter in Weka); use R; or use a good, existing software
package available to you. Describe in your report what code you used,
and cite any resources used. Check the resulting list of words to make
sure they are a good selection of words.
- Performance Metric(s):
- Use classification accuracy, time to construct the model,
dependency connections in the Bayesian graph, conditional probability
tables (CPTs), readability of the net, and any other
related information or metrics when
you evaluate the "goodness" of your models (note that some of these
evalution criteria are quantitative and some are qualitative).
- Compare the classification accuracies/errors you obtained against those of
benchmarking techniques or previously studied techniques as
ZeroR, OneR, J4.8, ANNs over the same (sub-)set
of data instances you used in each experiment.
Use the experimenter in Weka to compare the performance
of these different techniques, with a statistical significance
threshold p=0.05.
- Algorithm Options:
- Advanced Topic(s) (30 points):
Investigate in more depth (experimentally, theoretically, or both) a topic of your
and your teammate's choice that is related to Bayesian learning
and that is not covered already in this project.
This Bayesian learning related topic might be something that was described or
mentioned in the textbook or in class, or that comes from your own research,
or that is related to your and your teammate's interests.
One advanced topic per team.