Proj3 - CS 539 Fall 2014

Computer Science Department

CS539 Machine Learning - Fall 2014
Project 3 - Neural Networks

PROF. CAROLINA RUIZ

Due Date: Tuesday Oct. 7th, 2014. Slides are due at 10:00 am (by email) and Written Report is due at 2:00 pm (beginning of class).

Project Assignment:

Read Chapter 4 of the textbook about neural nets in great detail.
THOROUGHLY READ AND FOLLOW THE PROJECT GUIDELINES. These guidelines contain detailed information about how to structure your project, and how to prepare your written and oral reports.
*** You must use the Project 3 Template provided for your written report. *** (if you prefer not to use Word, you can copy and paste this format in a different editor as long as you respect the stated page structure and page limit.)
- Machine Learning Technique(s): Use the neural networks methods implemented in Weka and in R (or implement your own code). You can find the Weka module implementing neural nets under Classifiers, functions, MultilayerPerceptron; and the Matlab functions implementing neural nets in the Neural Network Toolbox.
- Dataset(s): In this project, we will use two datasets:
  - Face Recognition:: Use the Face Recognition dataset described in Section 4.7 of this course's textbook: Tom Mitchell's Machine Learning. This dataset is also available at the UCI Data Repository. You can use the same learning task (that is, learn the direction the person is facing: left, right, straight, or upward) and the same design decisions and parameters described in Section 4.7. For example, you can use the one-quarter size images, if you wish. I encourage you to experiment with other settings and design decisions, and even other learning tasks (e.g., sunglasses recognizer or face recognizer) if time allows.
  - S&P 100 Index: For the 2nd dataset, you will collect daily closing prices for each of the 100 companies in the S&P 100 stock market index. These daily closing prices are freely available online (see for example Yahoo Finance). Collect daily closing prices from January 1, 2013 - Sept. 15 2014 (except of course for the days when the stock market was closed). To keep it simple, use January 1, 2013 - June 30, 2014 as training data, and July 1st - Sept. 15 2014 as test data (that is, do not use n-fold crossvalidation for this dataset). Also, use Biogen Idec (BIIB) as the prediction target.
- Performance Metric(s):
  - Use classification accuracy and confusion matrices for classification tasks; and use root mean squared error (RMSE) and correlation coefficient (CC) for regression tasks. [See Slides 56-60 of the Weka Textbook Slides Chapter 5. If you wish, you can use other metrics to evaluate the "goodness" of your models, in addition to the ones above.
  - If possible, compare the results you obtained against those of benchmarking techniques or previously studied techniques as ZeroR, OneR, and decison trees over the same (sub-)set of data instances you used in each experiment.
  - Report the training time needed to construct the model in each of the experiments.
- Evaluation and Testing: The training time of neural networks may be very high in some cases. If n-fold cross validation with n=10 takes too long, you can lower the number of folds n to say 3, or you can choose another evaluation method (e.g., %split) if necessary.
- Design Decisions: For experimentation different to that described in Section 4.7 of the textbook, I offer the following guidelines:
  - Topology of your Neural Net:
    - I suggest that you use a 2-layer, feedforward architecture. More specifically, a net consisting of (1 input layer,) 1 hidden layer, and 1 output layer. Each node in a layer is connected to each and everyone of the nodes in the next layer, and no nodes on the same layer are connected. You will need to determine experimentally how many nodes to use in the hidden layer. However, you can experiment with other architures in addition to the one suggested here.
    - In the case of non-numeric target attributes, decide on a convention that you'll use to match output nodes values and target attribute values.
  - Neural Net Parameters: Besides experimenting with the topology of the neural net, see how varying the learning rate, momentum, number of iterations (training time), decay, size of validation set, and other parameters affect the error backpropagation algorithm and the quality of its results.
Project 3 Grading Sheet

CS539 Machine Learning - Fall 2014 Project 3 - Neural Networks

PROF. CAROLINA RUIZ

Project Assignment:

CS539 Machine Learning - Fall 2014
Project 3 - Neural Networks