WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS 548 KNOWLEDGE DISCOVERY AND DATA MINING - Spring 2014  
Project 2: Decision Trees, Linear Regression, Model Trees, Regression Trees

PROF. CAROLINA RUIZ 

DUE DATE: Tuesday March 4th, 2014. ------------------------------------------

Project Assignment:

  1. Study Chapter 4 and Appendix D of the textbook in great detail.

  2. Study all the materials posted on the course Lecture Notes: In particular, you should know the algorithms to construct decision trees, regression trees, and model trees very well, and be able to use these algorithms to construct trees from data by hand during the test. See examples provided in the Lecture Notes linked above. (Note: for model and regression trees, a software tool will be used to obtain the necessary linear regressions.)

  3. THOROUGHLY READ AND FOLLOW THE PROJECT GUIDELINES. These guidelines contain detailed information about how to structure your project, and how to prepare your written summary, and how to study for the test.

    You must follow the 5 page written report format described in the PROJECT GUIDELINES. In particular for this project:
    Page 2 should contain a table summarizing classification experiments ran with decision trees.
    Page 3 should contain a table summarizing regression experiments ran with linear regression, model trees, and regression trees.

    *** You must use the Project 2 Template provided for your written report. *** (if you prefer not to use Word, you can copy and paste this format in a different editor as long as you respect the stated page structure and page limit.)

  4. Advanced Topic(s): Investigate in more depth (experimentally, theoretically, or both) a topic of your choice that is related to decision or model/regression trees and that was not covered already in this project, class lectures, or the textbook. This tree-related topic might be something that was described or mentioned briefly in the textbook or in class; comes from your own research; or is related to your interests. Just a few sample ideas are: The prune function in Matlab; C4.5; C4.5 pruning methods (for trees or for rules); any of the additional tree classifiers in Weka: DecisionStump, LMT RandomForest, RandomTree, REPTree; meta-learning applied to decision trees (see Classifier -> Choose -> meta); other useful functionality in Matlab or RapidMiner; an idea from a research paper that you find intriguing; or any other tree-related topic.