WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

Genetic Algorithms

Background/Description

Genetic Algorithms are a search technique that utilizes the concept of natural selection. The concept being learned is represented as a "gene" - usually a bit string. Populations of these genes are created randomly and then tested using a fitness function to indicate which ones are the "best" (this depends on the application). For example, if a GA is being used for classification, the best genes are the ones that correctly classify the most training instances. After the fitness for each gene has been computed, a selection algorithm is used to determine which genes survive to the next population. Some selection algorithms are:

Some percentage of the genes in the new population are created by crossover. Crossover is the GA equivalent to mating. In crossover, two genes are spliced and recombined in such a way that the correctness of the resulting gene is preserved. Some crossover algorithms use one crossover point (like the middle), others use several. Crossover points can be fixed or random.

GAs can be used for many different types of applications. Some examples are:

The advantage of GAs are that they can be used to search a wide search space and come up with answers that might not be obtained by other means. A disadvantage is that they may need many generations to find their solution, if one is found at all. Another potential problem is that if there is not sufficient randomness in the selection and crossover methods the GA may not converge on the best possible solution.

References

Books and Papers

Mitchell, T., "Machine Learning," McGraw Hill, 1997, pp. 249-259.

This textbook, used in class, described how GAs worked and provided the example used for crossover by this project.

Schlabach, J., Hayes, C., Goldberg, D., SHAKA-GA: A Genetic Algorithm for Generating and Analyzing Battlefield Courses of Action, 1997.

This paper describes an earlier version of the FOX Course of Action Generator described in the project presentation.

Bently, P., Wakefield, J., The Table: An Illustration of Evolutionary Design using Genetic Algorithms, Genetic Algorithms in Enginerring Systems: Innovations and Applications, 12-14 September 1995, Conference Publication No. 414, IEE, 1995.

This paper describes using GAs to design tables.

Web References:

These were used for background information only.

Problem

This project used GAs to classify the census income data used in earlier projects in this class.

Gene Representation

Each gene is a variable length number of classification rules stored as bit strings. Not all of the census data attributes were used. Attributes used were selected based on the Decision Tree project done earlier in the semester.

Classification attributes where any value was acceptable were represented as all ones so that a bitwise AND could be used to compare the rules with the training and test instances.

Gene Selection and Crossover

Selection was done using a tournament selection where two genes were selected at random and the one with the highest fitness was selected. Genes for crossover were selected randomly from the original population and crossed over using the algorithm used by GABIL. In this algorithm, cutpoints are chosen randomly but checked to ensure that the rule boundaries remain consistent. Mutation was not implemented for this project.

Fitness

Fitness was computed by comparing each training instance to each rule within a gene. If any of the rules matched the instance, the rule was considered a correct classification. The fitness for each gene was incremented by one each time it correctly classified a training instance.

Results

The results were surprisingly good. They are shown in the sample runs referenced below and in the report and presentation. The results are even more surprising when the rules are examined. They are not what would be expected to classify the data. This could be due to a problem with the program (which has not been found) or the data set is not evenly distributed for the attributes chosen.

Solution

There are two sample runs available for the program: a test using 5000 test and training instances and a test using 100 test and training instances. These sample runs give the accuracy for the program and the resulting rules.

Results are also given in the Presentation and project report.

[Return to the WPI Homepage] [Return to the CS Homepage] [Return to Janet's Home Page]