WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS539 Machine Learning 
Project - Spring 2017

PROF. CAROLINA RUIZ 

Due Dates:
Phase I: Thursday, April 06, 2017 at 1 pm
Phase II: Thursday, April 27th at 1 pm

------------------------------------------

Project Instructions:


Project Description:

  1. Machine Learning Method that will be the Focus of Your Project: Together with your group partner, select one among the following machine learning methods studied in this course:
    Your project for this class MUST be different from projects that you need to work on or have worked on for other classes (e.g., AI, Data Mining, Deep Learning, Big Data Analytics, ISP, ISG, ...).
    Your project MUST use only techniques covered in this class this semester.

  2. Advanced Topic Related to Your Selected Machine Learning Focus: Select a topic related to the machine learning method that you are focusing on that was not covered in the homework or in the lectures. For example, if your machine learning method is Artificial Neural Networks, a possible advanced topic is Simulated Annealing.

    Investigate this topic using sources from the scientific literature and other textbooks.

  3. Dataset: Choose a dataset appropriate for the machine learning method that you selected to be your focus for this project. This dataset should be related to your own interests. *** Your chosen dataset should contain enough instances and attributes to provide sufficient data for fruitful and interesting experiments. ***
    Here are some possibilites:

  4. Experiments: Design and perform thorough experimentation with machine learning techniques to construct and evaluate prediction models over your dataset as described below.
    1. Study Chapter 19 of the textbook. Follow the guidelines for machine learning experiments described in Section 19.4. In particular, use k-fold crossvalidation with a value of k that is appropriate for your dataset (e.g., k=10 or k=4).
    2. Experiment with and without pre-processing. For this use pre-processing techniques (e.g., imputation of mising values) and dimensionality reduction techniques (i.e., subset feature selection, principal components analysis, factor analysis, singular value decomposition, ...) studied in the course.
    3. Construct prediction models using the machine learning method that you chose to focus on. Experiment extensively with varying the method's parameter values, data pre-processing and model post-processing (if applicable).
    4. Evaluate these prediction models using the evaluation metrics described in Section 19.7 of the textbook.
    5. Evalute these predition models in terms of patterns they express about the dataset (i.e., "read" the models to see what patterns they consist of).
    6. Perform experimentation with the advanced topic that you investigated related to your focus machine learning method. Evaluate these models and compare them with those constructed above.
    7. Run experiments with other machine learning methods studied in class (and listed above as choices for this project) on your dataset. Evaluate these models.
    8. Perform a detailed comparison of all of the models and results obtained above, both in terms of the quantitative evaluation metrics and also in terms of the data patterns that they describe.
    9. Study Chapter 17 of the textbook. Use at least three different meta-learning techniques implemented in Matlab, including Boosting, Bagging, and another one of your choice, described in Chapter 17 to combine multiple learners. Stacking is optional as it is not currently implemented in Matlab, but groups who successfully implement it in Matlab and experiment with it will receive extra credit. See also Chapter 17's slides and take a look at my notes and resources on combining multiple models.
    10. Compare the results of meta-learning with all the other results above.
    11. Use meaningful visualizations throughout the project to analyze the data, your models, and your results.