Project Assignment:
THOROUGHLY READ AND FOLLOW THE
PROJECT GUIDELINES.
These guidelines contain detailed information about how to structure your
project, and how to prepare your written and oral reports.
Your written report should be at most 5 pages long
(including all the graphs, figures, and appendices).
The font size should be no smaller than 11pts.
- Data Mining Technique(s):
Use Instance-based Learning and Regression techniques to construct
classifiers for each of the following problems:
Use the following Instance-based Learning and Regression
methods implemented in the Weka system, or
implement your own code.
- Instance-based Learning:
- IB1: nearest neighbor classification
- IBk: k-nearest neighbors classification. Experiment with several values of k.
- Regression:
- Linear Regression
- LWR: Locally Weighted Regression
[In order to run locally weighted linear regression using
Weka, use LWL (locally weighted learning) from the Weka's
lazy classifiers, and select "Linear Regression" for the
LWL's classifier option]
- Dataset(s):
In this project, we will use one dataset:
-
The Heart Disease Data Set
available at the
UCI Machine Learning Repository.
Use the
processed.cleveland.data for your experiments.
- Target Attribute:
Run a set of experiments using the target attribute as numeric, and
then run a separate set of experiments using the target attribute as
nominal (for those methods in this project that can handle nominal targets).
- Training and Testing Instances:
You may restrict your test set to 100 data instances or less
(not used for training) to reduce the time taken by the experiments
using these lazy methods.