Specific Guidelines for the Experiments and Written Report
Your written report should follow the structure described below:
- Page 1 should contain each of the team members' names,
and authorship - that is, a description of who did what on this project.
- [10 points] Algorithms and Code: (maximum 1 page)
Page 2 should provide a description (in very high level pseudo-code)
of the Weka code that implements the data mining technique(s)
covered by the project.
In your written report, describe the algorithm underlying the code
IN YOUR OWN WORDS, using high level pseudo-code.
Briefly explain the algorithm in terms of the inputs it receives
and the outputs it produces, AND the main steps it follows to construct
the model (in pseudo-code).
Make sure that the algorithms you described are the ones
implemented in the Weka code, which are not necessarily the same as the ones
described in the textbook or in class.
-
Experiments and Results: (maximum 3 pages per challenge)
The next pages should contain descriptions of your experiments
and answers to the provided challenges.
Maximum 2 pages per challenge. For each challenge:
- [15 points per challenge] Challenge's page 1:
Use tables to summarize the data, pre-processing, settings, postprocessing,
and results of the experiments you run.
You should run a sufficient number of coherent experiments
(that is, the results of an experiment should motivate the next experiment(s)).
Each row in the table corresponds to an experiment and the columns are
as following:
- Data:
What data did you use to construct and test your model?
- Pre-Processing:
What pre-processing was done to the data
in order to improve the model's performance.
- Post-Processing:
What post-processing was done to the
model in order to improve the model's performance.
- Experimental Protocol:
Only if for some reason it is different from 10-fold cross-validation.
- Resulting model:
Describe the size and readability of the resulting model.
Include any relavant observations about the model.
- Performance of the resulting model:
- State what the performance of the model is.
- If applicable,
elaborate on the confusion matrix and/or other relevant
performance indicators.
- How long did it take Weka to construct this model?
- [10 points per challenge] Challenge's Pages 2 and 3:
Best Performing Model.
- The first of these two page should provide a detailed description of the
experiment
that yielded the best performing model you were able to construct for this
challenge.
Include detailed descriptions of:
- pre-processing done,
- advanced techiques used,
- clever ideas used during the contruction of the model
Provide both motivation and justification for running this experiment in the
way that you did (an important aspect of this part of the report is story-telling).
Provide any useful graphs and visualizations.
- answers to the specific questions asked in the challenge description.
- The second of these two pages should contain the best performing model (e.g., decision tree).
- [10 points] Conclusions and Lessons Learned.
The last page of your report should include a discussion of:
- How the performance of the experiments in this project
compare with that of your best performing results from
previous projects on the same dataset.
- How well the particular data mining methods used in this
project worked on this dataset.
- What combination of pre-processing, parameter values, and/or
post-processing yielded particularly good results?
- What lessons you learned in this project and how you would use
these lessons on the next project.
- Strengths and weaknesses of your project.