CS 539 Spring 2007

Computer Science Department

CS539 Machine Learning - Spring 2007
Project 3 - Neural Networks

PROF. CAROLINA RUIZ

Due Date:

Slides for Part 1 due on Thursday, Feb. 8 2007 at 3 pm.
Written Report for Parts 1 and 2 due on Thursday, Feb. 15 2007 at 4 pm.
Slides for Part 2 due on Thursday, Feb. 15 2007 at 3 pm.

Project Description
Project Assignment
Report Submission and Due Date

PROJECT DESCRIPTION

Part 1. Construct the most accurate neural networks you can for predicting the class attribute of each of the following datasets:

CPU dataset (available with the Weka System)
The World Happiness dataset.
Part 2. Construct the most accurate neural networks you can for predicting the class attribute of each of the following datasets:
- Covertype data available at the UCI Machine Learning Repository.
- A dataset of your choice. This dataset can consist of data that you use for your own research or work, a dataset taken from a public data repository (e.g., UCI Machine Learning Repository, or from the UCI KDD Archive), or data that you collect from public data sources. THIS DATASET SHOULD BE LARGE IN TERMS OF THE NUMBER OF INSTANCES AND ATTRIBUTES SO IT CANNOT BE ONE OF THOSE INCLUDED IN THE WEKA SYSTEM.

PROJECT ASSIGNMENT

Read Chapter 4 of the textbook about neural networks in great detail.
Solve Exercises 4.3, 4.6, and 4.7 of your textbook (pages 124-125). Include your solution in your written report (and not in your oral report).
Read the neural networks code in the Weka system in great detail.
The following are guidelines for the construction of your neural networks:
- Code: Use the neural networks methods implemented in the Weka system, or implement your own code. You can find the Weka module implementing neural nets under Classifiers, functions, MultilayerPerceptron.
- Objectives of the Learning Experiments: Before you start running experiments, look at the raw data in detail. Figure out 3 to 5 specific, interesting questions about the domain that you want to answer with your NEURAL NETWORK experiments. These questions may be phrased as conjectures that you want to confirm/refute with your experimental results.
- Topology of your Neural Net: I suggest that you use a 2-layer, feedforward architecture. More specifically, a net consisting of (1 input layer,) 1 hidden layer, and 1 output layer. Each node in a layer is connected to each and everyone of the nodes in the next layer, and no nodes on the same layer are connected. However, you can experiment with other architures in addition to the one suggested here.
  In the case of non-numeric target attributes, decide on a convention that you'll use to match output nodes values and target attribute values.
- Neural Net Parameters: Besides experimenting with the topology of the neural net, see how varying the learning rate, momentum, number of iterations (training time), decay, size of validation set, and other parameters affect the error backpropagation algorithm and the quality of its results.
- Training and Testing Instances: You may restrict your experiments to a subset of the instances IF Weka cannot handle your whole dataset. But remember that the more accurate your neural network is, the better.
- Preprocessing of the Data: A main part of this project is the preprocessing of your dataset. The neural networks implementation in the Weka system provides some data preprocessing capabilities (nominalToBinaryFilter, normalizeAttributes, and normalizeNumerClass). Experiment with that functionality and compare the performance of the error back propagation algorithm when those built-in capabilities are used vs. the perfomance when you pre-process the dataset prior to using neural networks. Compare also its performance with and without the removal of missing values.
  Your report should contained a detailed description of the preprocessing of your dataset and justifications of the steps you followed. If Weka does not provide the functionality you need to preprocess your data as you need to obtain useful patterns, preprocess the data yourself either by writing the necessary filters (you can incorporate them in Weka if you wish).
- Evaluation and Testing: Experiment with different number of folds for n-fold crossvalidation. It would be ok to keep the number of folds low given that the training time may be quite high.

REPORT AND DUE DATE

Written Report.
Your report should contain the following sections with the corresponding discussions:
1. Code Description: Describe the neural networks code that you used from Weka. Explain the algorithm underlying the code in terms of the input it receives and the output it produces, and the main steps it follows to produce this output.
2. Data: Describe the dataset that you selected in terms of the attributes present in the data, the number of instances, missing values, and other relevant characteristics.
  Describe your 3-5 guiding questions/conjectures.
  Provide a detail description of the preprocessing of your data. Justify the preprocessing you apply and why the resulting data is the appropriate one for mining neural networks from it.
3. Experiments: For each experiment you ran describe:
  - Which of your 3-5 specific questions/conjectures about the dataset domain you aim to answer/validate with your experiments.
  - Data: What data did you use to construct and test your neural networks?
  - Any additional pre or post processing done to the data or the NN output in order to improve the accuracy of your net.
  - Accuracy of the resulting neural networks.
  - Discuss how this accuracy compares with that of your most accurate ZeroR experiment and decision trees from the previous assignments.
4. Summary of Results
  - For each dataset, what was the accuracy of the most accurate neural network constructed in your project?
  - strengths and the weaknesses of your project.
Oral Report. We will discuss the results from the individual projects during class. Your oral report should summarize the different sections of your written report as described above. Each of you will have 5 minutes to explain your results and to discuss your project in class. Be prepared!
Submission and Due Date.
Please submit the following:
1. [your-lastname]_proj3_slides_part1.[ext] containing the slides for your oral report of Part 1. This file should be either a PDF file (ext=pdf) or a PowerPoint file (ext=ppt). Please use only lower case letters in the name file. For instance my file would be named ruiz_proj3_slides_part1.ppt
  Deadline for submission: 3 pm on Thursday, February 8 2007.
2. Bring a hard copy of your written for Parts 1 and 2 to the beginning of class (by 4:00 pm) on Thursday, February 15 2007.
3. [your-lastname]_proj3_slides_part2.[ext] containing your slides for your oral report of Part 2. This file should be either a PDF file (ext=pdf) or a PowerPoint file (ext=ppt).
  Deadline for submission: 3:00 pm on Thursday, February 15 2007.

CS539 Machine Learning - Spring 2007 Project 3 - Neural Networks

PROF. CAROLINA RUIZ

PROJECT DESCRIPTION

PROJECT ASSIGNMENT

REPORT AND DUE DATE

CS539 Machine Learning - Spring 2007
Project 3 - Neural Networks