In particular, you should know what an association rule is;
metrics to quantify association rules (e.g., support, confidence, lift, leverage, conviction, interest factor, correlation analysis, IS measure, ...);
the Apriori principle;
the Apriori algorithms to construct association rules,
including frequent itemset generation and candidate generation and prunning
(join/merge condition and subset pruning), and
rule generation and confidence-based pruning.
You should be able to use these algorithms to construct association rules from data
by hand during the test.
See examples provided in the Lecture Notes linked above.
THOROUGHLY READ AND FOLLOW THE
PROJECT GUIDELINES.
These guidelines contain detailed information about how to structure your
project, and how to prepare your written summary, and how to study for the test.
*** You must use the
Project 3 Template provided for your written report.
(If you prefer not to use Word, you can copy and paste this format in a
different editor as long as you respect the stated page structure and
page limit.)
Data Mining Technique(s):
We will run experiments in Weka and in Python using the following techniques:
Association rule mining: all the association rule mining techniques available in
Weka and in Python.
Dataset(s):
In this project, we will use two datasets (about 25% of the experiments should be
done with the 1st dataset and about 75% with the 2nd dataset):
The
Mushroom Data Set
available at the
UCI Machine Learning Repository.
Use the complete attribute value names
(e.g., use "bell" instead of "b","conical" instead of "c", and so on)
to make the association rules easier to read.
For classification association rules (cars):
Pick either income (<$50K or >$50K) OR sex as the target attribute.
Decide which of these two attributes would be a better target and use it for all your classificiation experiments.
Evaluation:
Quantitative evaluation:
Use support, confidence, lift, leverage, and conviction. Include
in your report a definition (using a precise formula) and a description
of the meaning of each of these metrics.
Also, for extra credit you are
encouraged (but not required) to implement in Weka other association rule
metrics defined in Section 6.7 of the textbook (e.g., interest factor,
correlation analysis, IS measure, ...), and experiment with them.
Qualitative evaluation:
Use visualizations of the sets of association rules obtained and
analyze those visualitions.
Read the association rules obtained and pick a handful of interesting
ones to describe in your report.
General Comments:
In constrast with our previous classification and regression projects,
we won't use any evaluation protocol (e.g., 10-fold cross validation)
for the association analysis of this project, as we're not using the
rules for prediction.
Focus instead on experimenting with different ways of preprocessing
the data, varying the parameters of the Apriori algorithm, and
providing your own method to evaluate the resulting collections of
association rules. Remember to experiment with car (that is, classification
association rules) and to compare its classification performance to that of
decision trees; and remember to experiment with non-car rules also.
Advanced Topic(s):
Investigate in more depth (experimentally, theoretically, or both) a topic of your
choice that is related to association rule mining
and that is not covered already in this project.
This association rule mining -related topic might be something that was described or mentioned
in the textbook or in class, or that comes from your own research, or that is related
to your interests, or that appears in a research paper that you find intriguing.