DUE DATE: Tuesday December 5th, 2017.
- Slides: Submit via Canvas by 2:00 pm.
- Written report: Hand in a hardcopy by the beginning of class (by 3:59 pm).
Project Description
- Project Instructions:
- *** Work on this project individually.
That is, no group work is allowed for this project. ***
-
THOROUGHLY READ AND FOLLOW THE
PROJECT GUIDELINES.
These guidelines contain detailed information about how to structure your
project, and how to prepare your written and oral reports.
- *** You must use the
Project 5 Report Template
provided for your written report. *** (if you prefer not to use Word, you can copy and paste this format in a different editor as long as you respect the stated page structure and page limit.)
Do not exceed the page limits specified in the report template.
- Your class presentation must be at most 4 minutes long.
- Data Mining Technique(s):
Choose ONE of the following topics for your project:
- anomaly detection
- web mining
- text mining (if you choose this topic, you will need to do something completely
different to what you did in project 3)
- sequence mining (if you choose this topic, you will need to do something completely
different to what you did in project 3)
- multimedia data mining
For your chosen topic, you must use ONLY data mining techniques studied in
this course to address that topic (e.g., using neural networks or
support vector machines is not
allowed as we didn't study these techniques in this course).
- Dataset:
- Choose a dataset appropriate for the data mining topic that you selected for this project and related to your own interests.
- If you are registered for BCB503, your dataset must be a dataset related to bioinformatics and/or computational biology.
- *** Your chosen dataset should contain enough instances and attributes
to provide sufficient data for fruitful and interesting experiments. ***
-
Here are some possibilites:
- A dataset you are working with for your research or your job.
- A dataset from a data repository listed in
the online resources,
or other online data repository.
- Other data sources of your choice.
- Performance Metric(s):
Use performance metrics appropriate to the mining application that you
chose. If you are not aware of any,
propose a variety of approaches to measure how good the results
of your experiments are.
Use visualization of the constructed model or patterns
to evaluate your results.
The more creative/ingenious your approaches, the better.
You might need to extend the Weka code or Python libraries to obtain the
evaluation/interpretation functionality that you need.
- General Comments
- You must run your experiments in Python.
You may also use Weka but most of your experiments must be run in Python.
- Remember that you can only use data mining techniques studied in class
(in case of doubt about a certain technique, ask the professor in advance).
- Focus on experimenting with different ways of preprocessing
the data, adapting different techniques studied in this course
to tackle the problem at hand, and investigating on your own
other existing approaches.
The more creative/ingenious your work and/or the more research
into the related literature you do, the better.
- Extra credit will be given to particularly
creative and/or high quality work, and/or for independently
researching the data mining topic/technique chosen beyond what it was covered about that
topic/technique in class.