WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

CS 525 KNOWLEDGE DISCOVERY AND DATA MINING  
SYLLABUS - Spring 2004

PROF. CAROLINA RUIZ 

WARNING: Small changes to this syllabus may be made during the course of the semester. 
------------------------------------------

COURSE DESCRIPTION:

Due to advances in technology and the availability of increasingly cheap storage devices, data in different domains has been accumulating at an impressively high rate in recent years, leading to very large databases. This course presents current research in Knowledge Discovery in Databases (KDD) dealing with the data integration, mining, and interpretation of patterns in such databases. Topics include data warehousing and mediation techniques aimed at integrating distributed, heterogeneous datasources; data mining techniques such as rule-based learning, decision trees, association rule mining, and statistical analysis for discovery of patterns in the integrated data; and evaluation and interpretation of the mined patterns using visualization techniques. The work discussed originates in the fields of databases, artificial intelligence, information retrieval, data visualization, and statistics. Industrial and scientific applications will be given.

This course presents data mining from a database perspective. For an in-depth study of the machine learning techniques used in data mining, take CS539 Machine Learning which is scheduled to be offered during the 2004-2005 academic year.

Students will be expected to read assigned textbook chapters and research papers, and work on implementation/research projects that cover the different stages of the KDD process.


PREREQUISITE:

Background in databases and artificial intelligence at the undergraduate level, or permission of the instructor. Background in statistics would be helpful but is not assumed. Proficiency in a high level programming language (preferable Java) is required.


CLASS MEETING:

Tuesdays and Thursdays 3:00 - 4:20 pm
FL320

Students are also encouraged to attend the Knowledge Discovery in Databases and Data Mining Research Group (KDDRG) Seminar Fridays at 2 pm in Beckett Conference Room (FL246).


PROFESSOR:

Prof. Carolina Ruiz
ruiz@cs.wpi.edu
Office: FL 232
Phone Number: (508) 831-5640
Office Hours: Tu 2-2:50 pm, Fr 3-4 pm, or by appointment.

Other speakers may occasionally be invited to lecture to the class.


READINGS:

Several other books on the subject and related subjects are recommended below. Several research papers will be handed out during the semester.

GRADES:

Exam   20%
Homework   08%
Project   72% (12% each project)
Participation in class discussions of assigned topics  10% Extra points

Your final grade will reflect your own work and achievements during the course. Any type of cheating will be penalized with an F grade for the course and will be reported to the WPI Judicial Board in accordance with the Academic Honesty Policy.


EXAM

There will be one midterm exam. This exam will cover the material presented in class since the beginning of the semester. 


HOMEWORK

There will be one assigned homework. The homework is intended as preparation for the midterm exam. The homework will cover the material in chapters 1 through 5 of the textbook. 

PROJECTS

There will be a total of six interrelated projects. Each of the projects deals with one of the data mining techniques covered in the class. Datasets for those projects will be selected from online database repositories, or other sources.

About the Weka System: For most of the projects, we will use the Weka system (http://www.cs.waikato.ac.nz/ml/weka/). Weka is an excellent machine-leaning/data-mining environment. It provides a large collection of Java-based mining algorithms, data preprocessing filters, and experimentation capabilities. Weka is open source software issued under the GNU General Public License. For more information on the Weka sytem, to download the system and to get its documentation, look at Weka's webpage (http://www.cs.waikato.ac.nz/ml/weka/). You should download the latest available stable GUI version of the system.

Students will be required to provide both a written report and an oral (in-class) presentation describing their achievements in each of these projects.

CLASS PARTICIPATION

All students are expected to read the material assigned for each class in advance and to participate in class discussions. Also, students will take turns presenting papers and leading class discussions of assigned readings.

CLASS MAILING LIST

There are two mailing lists for this class:

CLASS WEB PAGES

The web pages for this class are located at http://www.cs.wpi.edu/~cs525d/s04/
Announcements will be posted on the web pages and/or the class mailing list, and so you are urged to check your email and the class web pages frequently. 

ADDITIONAL REFERENCES

(See also the list of selected papers in the Class Schedule.)

Knowledge Discovery and Data Mining

Machine Learning

General AI

Databases

Statistics


OTHER ONLINE RESOURCES:

Data Sets

KDD

KDD Commercial Products / Prototypes

Data Warehousing and OLAP

Machine Learning

Statistics

General AI