CS539 Machine Learning - Spring 2005
Project 4 - Evaluating Hypotheses
Due Date:
Thursday, Feb. 24 2005 at 4 pm.
-
Study Chapter 5 in detail.
-
Solve each and every book exercise at the end of the Chapter 5:
5.1, 5.2, 5.3, 5.4, 5.5, and 5.6.
-
Use stratified sampling to select two different subsets of 1000 data
instances each from the Cover Type dataset.
Piotr's script can be used
for this purpose - thanks Piotr!
Let's denote these subsets S1 and S2.
- Learn a J4.8 decision tree t over S1 using a 75% split.
That is, use 75% of the data to build the tree and the remaining
25% to calculate the errorS(t).
Use this errorS1'(t) to estimate with 95% probability the
errorD(t), i.e. the error of t over the entire
distribution D of cover type instances.
- Train a neural network nn with 1 hidden layer and other
default parameters over the dataset S2 using a 75% split.
That is, use 75% of the data in S2 to build the tree and the remaining
25% to calculate the errorS2'(nn).
Compare the decision tree t from above and the neural network nn
by estimating the difference d between the true errors of these
two hypotheses
with 95% probability
using errorS1'(t) and errorS2'(nn).
- Compare J4.8 decision trees and
Neural Networks over the Cover Type dataset by
estimating the difference in error between J4.8 decision trees and
Neural Networks over the Cover Type dataset with an approximate confidence
interval of 95%.
Do this by using a paired t test with k=11 with the data subset S1 from above
as D0.
Please turn in written solutions to these problems at the beginning
of class on Thursday, February 24th and be ready to discuss your solutions
and the chapter in class.