Submit Slides and Written Report via Canvas by 2:00 pm.
Project Description
Project Instructions:
*** Work on this project individually.
That is, no group work is allowed for this project. ***
THOROUGHLY READ AND FOLLOW THE
PROJECT GUIDELINES.
These guidelines contain detailed information about how to structure your
project, and how to prepare your written and oral reports.
*** You must use the
Project 5 Report Template
provided for your written report. *** (if you prefer not to use Word, you can copy and paste this format in a different editor as long as you respect the stated page structure and page limit.)
Do not exceed the page limits specified in the report template.
Your class presentation must be at most 3 minutes long.
Data Mining Technique(s):
Choose ONE of the following topics for your project:
anomaly detection
web mining
text mining
sequence mining
multimedia data mining
For your chosen topic,
you must use ONLY data mining techniques studied in
this course to address that topic
(e.g.,
using Bayesian networks,
support vector machines,
spectral clustering,
association rules,
or any other techniques not covered in class this semester
is not allowed).
Dataset:
Choose a dataset appropriate for the data mining topic that you selected for this project and related to your own interests.
If you are registered for BCB503 or CS583, your dataset must be a dataset related to bioinformatics and/or computational biology.
*** Your chosen dataset should contain enough instances and attributes
to provide sufficient data for fruitful and interesting experiments. ***
Here are some possibilites:
A dataset you are working with for your research or your job.
A dataset from a data repository listed in
the online resources,
or other online data repository.
Other data sources of your choice.
Performance Metric(s):
Use performance metrics appropriate to the mining application that you
chose. If you are not aware of any,
propose a variety of approaches to measure how good the results
of your experiments are.
Use visualization of the constructed model(s) or patterns
to evaluate your results.
The more creative/ingenious your approaches, the better.
You might need to write your own Python code to obtain the
evaluation/interpretation functionality that you need.
General Comments
You must run your experiments in Python, using libraries and functions
that we used in Projects 1-4 in this course.
Remember that
you are allowed to use only data mining techniques studied in class
(in case of doubt about a certain technique, ask the professor in advance).
Focus on experimenting with different ways of preprocessing
the data and adapting different techniques studied in this course
to tackle the problem at hand.
The more creative/ingenious your work and/or the more research
into the related literature you do, the better.
Extra credit will be given to particularly
creative and/or high quality work, and/or for independently
researching the data mining topic/technique chosen beyond what it was covered about that
topic/technique in class.