DUE DATE: Thursday December 6th, 2018.
- Slides: Submit via Canvas by 2:00 pm.
- Written report: Hand in a hardcopy by the beginning of class (by 3:59 pm).
Project Description
- Project Instructions:
- *** Work on this project individually.
That is, no group work is allowed for this project. ***
-
THOROUGHLY READ AND FOLLOW THE
PROJECT GUIDELINES.
These guidelines contain detailed information about how to structure your
project, and how to prepare your written and oral reports.
- *** You must use the
Project 5 Report Template
provided for your written report. *** (if you prefer not to use Word, you can copy and paste this format in a different editor as long as you respect the stated page structure and page limit.)
Do not exceed the page limits specified in the report template.
- Your class presentation must be at most 3 minutes long.
- Data Mining Technique(s):
Choose ONE of the following topics for your project:
- anomaly detection
- web mining
- text mining
- sequence mining
- multimedia data mining
(if you choose this topic, you will need to do something completely
different to what you did in project 3)
For your chosen topic,
you must use ONLY data mining techniques studied in
this course to address that topic
(e.g.,
using Bayesian networks,
support vector machines,
spectral clustering,
association rules,
or any other techniques not covered in class this semester
is not allowed).
- Dataset:
- Choose a dataset appropriate for the data mining topic that you selected for this project and related to your own interests.
- If you are registered for BCB503 or CS583, your dataset must be a dataset related to bioinformatics and/or computational biology.
- *** Your chosen dataset should contain enough instances and attributes
to provide sufficient data for fruitful and interesting experiments. ***
-
Here are some possibilites:
- A dataset you are working with for your research or your job.
- A dataset from a data repository listed in
the online resources,
or other online data repository.
- Other data sources of your choice.
- Performance Metric(s):
Use performance metrics appropriate to the mining application that you
chose. If you are not aware of any,
propose a variety of approaches to measure how good the results
of your experiments are.
Use visualization of the constructed model(s) or patterns
to evaluate your results.
The more creative/ingenious your approaches, the better.
You might need to extend the Weka code or Python libraries to obtain the
evaluation/interpretation functionality that you need.
- General Comments
- You must run your experiments in Python.
You may also use Weka but most of your experiments must be run in Python.
- Remember that you can only use data mining techniques studied in class
(in case of doubt about a certain technique, ask the professor in advance).
- Focus on experimenting with different ways of preprocessing
the data and adapting different techniques studied in this course
to tackle the problem at hand.
The more creative/ingenious your work and/or the more research
into the related literature you do, the better.
- Extra credit will be given to particularly
creative and/or high quality work, and/or for independently
researching the data mining topic/technique chosen beyond what it was covered about that
topic/technique in class.