Project Assignment:
THOROUGHLY READ AND FOLLOW THE
PROJECT GUIDELINES.
These guidelines contain detailed information about how to structure your
project, and how to prepare your written and oral reports.
*** You must use the
Project 7 Template provided for your written report ***
(if you prefer not to use Word, you can copy and paste this format in a
different editor as long as you respect the stated page structure and
page limit.)
The font size should be no smaller than 11pts.
Do not exceed the page limit.
- Machine Learning Technique(s):
Genetic Algorithms. For the genetics algorithm code,
you can use one or more of the following alternatives:
- R packages. (This is perhaps the best option. Remember to add them to our Wikia page.)
- Use GA code in WEKA.
- you can use the GA code in
weka.classifiers.bayes.net.search.local.GeneticSearch
(accessible from the Experimenter), which evolves a population of
Bayesian Nets. If you wish, you can modify this code to evolve a
population of a different type of models of your choice.
- Previous versions of Weka used to provide a GA option for attribute
selection. This option is not part of the GUI in the current version,
but the code may be still available.
See GeneticSearch.java under the attributeSelection.
If so, you may need to extend the code as necessary to
include crossover, mutation, and selection operators as they are described
in Chapter 9 of the textbook.
- Use the publicly available code at
"http://lancet.mit.edu/ga/".
- Write your own genetic algorithm code implementing selection, crossover, and mutation as they are described in Chapter 9 of the textbook.
- Dataset(s):
In this project, we will use one dataset:
- Objective of this Project:
Choose one among the following possibilities:
- Use GAs for Feature Selection:
Pick a classification method (e.g., decision trees, ANNs, ...).
The goal is to find a "highly fit" subset of attributes from the dataset that
produces a good classifier (i.e., classificiation model).
Here, each individual in the population should represent a subset of
data attributes.
- Use GAs for Feature Selection and Extraction:
Pick a classification method (e.g., decision trees, ANNs, ...).
The goal is to find a "highly fit" set of attributes that
produces a good classifier.
The attributes in the set are either original attributes, or combinations of
original attributes (e.g., linear combinations of them like in PCA, or more
complex combinations like for example "Body Mass Index (BMI)" based on "height" and "weight", or other interesting type of combinations you can think of.)
Here, each individual in the population should represent a set of
attributes.
- Use GAs for Model Construction:
Pick a classification method (e.g., decision trees, ANNs, ...).
The goal is to find a "highly fit" classifier.
Here, each individual in the population should represent a classifier.
Section 9.3 provides a description of an approach
to using genetic algorithms to construct a rule-based classifier.
You can follow this approach or design your own.
- Use GAs for your own designed objective:
Your objective may be a variation of any of the objectives above, or
a completely different objective related to machine learning.
- Design Choices:
You need to decide how to implement/represent the following
notions/parameters in your system.
Make sure you explain your design
choices clearly and concisely in your written and in your oral report
(if possible, use graphical depictions or examples to illustrate your
choices):
- Individuals: elaborate on what an individual represents
and how its chromosome is encoded.
- Fitness function used.
- Selection method used.
- Cross-over method used.
- Mutation method used.
- Evolution:
- Size and initialization of the population.
- Termination condition(s) (e.g., fitness threshold, number of generations evolved, ...)
- Construction of a next generation:
fractions of the population constructed using
selection and crossover; mutation rate; and how a generation is
constructed based on the previous generation.
- Final answer (i.e., individual) output by your GA.
- Other:
- Don't forget to look at the resulting individuals,
and elaborate in your report on what these individuals are/"say".
- Report the training time taken by each of the experiments.