Hong Su's Ph.D.
January 2006
The CS department is very pleased to be able to announce that Hong Su successfully defended her Ph.D. Dissertation on Friday, January 20th, 2006. The title of her dissertation is "Automaton Meets Algebra: A Hybrid Paradigm for Processing XQuery over XML Streams".
Her advisor was Elke A. Rundensteiner. The committee consisted of Profs. George T. Heineman and Murali Mani, WPI CS, and Prof. Mitch Cherniack, Brandeis University,
Elke reports that Hong
"has conducted innovative research in the newly emerging area of XML stream processing. ... The results of her Ph.d. research are not only solid publications, but also include a working prototype system RAINDROP, based on which several other current DSRG students are basing their research projects. In addition, her ideas have also led to a funded NSF project on RAINDROP -- thus assuring that others in the DSRG lab will continue to expand upon her `seedling'.
Dr. Hong Su is happily employed in the optimizer group at Oracle Corp. in Redwood, CA: one of *the* core groups of Oracle working on the "guts" of the DBMS system. All of us in DSRG wish her continued best of luck in both her professional career and also in her personal life -- may both be filled with happy surprises!"
The abstract for the Dissertation is as follows:
XML stream applications bring the challenge of efficiently processing queries on sequentially accessible token-based data streams. The automaton paradigm is naturally suited for pattern retrieval on tokenized XML streams, but requires patches for implementing the filtering or restructuring functionalities common for the XML query languages. In contrast, the algebraic paradigm is well-established for processing self-contained tuples. However, it does not traditionally support token inputs. This dissertation proposes a framework called Raindrop, which accommodates both the automaton and algebra paradigms to take advantage of both.
First, we propose an architecture for Raindrop. Raindrop is an algebra framework that models queries at different abstraction levels. We represent the token-based automaton computations as an algebraic subplan at the high level while exposing the automaton details at the low level. The algebraic subplan modeling automaton computations can thus be integrated with the algebraic subplan modeling the non-automaton computations.
Second, we explore a novel optimization opportunity. Other XML stream processing systems always retrieve all the patterns in a query in the automaton. In contrast, Raindrop allows a plan to retrieve some of the pattern retrieval in the automaton and some out of the automaton. This opens up an automaton-in-or-out optimization opportunity. We study this optimization in two types of run-time environments, one with stable data characteristics and one with fluctuating data characteristics. We provide search strategies catering to each environment. We also describe how to migrate from a currently running plan to a new plan at run-time.
Third, we optimize the automaton computations using the schema knowledge. A set of criteria are established to decide what schema constraints are useful to a given query. Optimization rules utilizing different types of schema constraints are proposed based on the criteria. We design a rule application algorithm which ensures both completeness (i.e., no optimization is missed) and minimality (i.e., no redundant optimization is introduced). The experimentations on both real and synthetic data illustrate that these techniques bring significant performance improvement with little overhead.
In conclusion, Raindrop accommodates the advantages of both automaton and algebra to efficiently process XQueries over tokenized XML streams. The proposed automaton-in-or-out and schema-based optimization techniques can be also applied to several well-known XML stream processing systems such as Tukwila and YFilter.
Maintained by webmaster@cs.wpi.eduLast modified: August 03, 2006 09:12:20
