Another CS Ph.D.

The Computer Science Department recently awarded its tenth Ph.D. degree to Andreas Koeller. The advisor was Elke Rundensteiner.

Andreas successfully defended his thesis in public before his Ph.D. committee on December 14th, 2001. His committee consisted of WPI Profs. David Brown, Nabil Hachem, and Carolina Ruiz, plus Prof. Gunter Saake from the University of Magdeburg. His thesis title is "Integration of Heterogeneous Databases: Discovery of Meta-Information and Maintenance of Schema-Restructuring Views".

Thesis Abstract

In today's networked world, information is widely distributed across many independent databases in heterogeneous formats. Integrating such information is a difficult task, since database contents and structure change frequently, and users often have incomplete information about the databases they use. We investigated two fundamental problems in integration integration: How can we discover the structure and contents of and interrelationships between unknown databases, and how can we provide durable integration views over several such databases?

The first part of the dissertation addresses the fact that knowledge about the interrelationships between databases is essential for any attempt at solving the information integration problem. We are presenting an algorithm based on the clique-finding problem in graphs and k-uniform hypergraphs to discover redundancy relationships between two relations. Experimental studies on the algorithm illustrate its effectiveness on a variety of real-world data sets.

The second part of the dissertation addresses the durable view problem and presents the first algorithm for incremental view maintenance in schema- restructuring views. Such views are essential for the integration of heterogeneous databases. They are typically defined in schema-restructuring query languages like SchemaSQL, which can transform schema into data and vice versa, making traditional view maintenance through differential queries impossible. Based on an existing algebra for SchemaSQL, we present an algorithm that propagates updates along the query algebra tree and prove its correctness. Experimental results showing its benefits over view recomputation.