Abstract of PhD Thesis - Abdul Nizar

Friday 13 April 2012 at 12:22 am.

XML query processing systems use XPath expressions to navigate through thetree model of XML to select a subset of nodes. Complex queries can be expressedin XPath using path expressions. A path expression consists of axes that expressdirection of navigation in the XML tree and predicates that filter the nodes selectedby axes. There are are four ordered axes in XPath namely, following, following-sibling,preceding and preceding-sibling which can effectively express retrievals satisfyingcertain order among the chosen elements of XML data. However, there are onlyfew proposals, like SPEX [Olteanu, 2007] and the Layered-NFA [Onizuka, 2010],for processing XPath expressions with ordered axes.

Path expressions consisting of child and descendant axes and predicates areconventionally represented using tree structures known as twig queries. Twigrepresentation of XPath expressions is quite convenient for query processing as theprocess is closely tied to the representation scheme adopted for theXMLdata. Twigstructures are unordered trees and fail to capture the semantics of ordered axes.Hence twig-based algorithms can not be directly extended to handle ordered axes.However, for path expressions having only child and descendant axes, twig-basedalgorithms have been shown to outperform systems based on other approachesand formalisms.

In this thesis, we propose a new framework known as Order-aware Twigs(OaTs) which can model XPath expressions containing ordered axes in additionto child and descendant axes. The OaT structure extends conventional twigs byintroducing new ordering and relationship constraints. We also extend OaTs toOrder-aware Graphs (OaGs) to encode expressions containing un-ordered backwardaxes namely, parent and ancestor, in addition to the six axes mentioned above.We also design algorithms for processing XPath subsets containing ordered axesagainst streaming XML data based on the OaT framework. We consider standaloneand shared processing of linear path queries containing all the forwardaxes namely, child, descendant, following and following-sibling. We also propose anefficient algorithm for stream processing of linear path expressions containing child,descendant, preceding and preceding-sibling axes. We formally and experimentallyshow that the proposed algorithms outperform the rival systems by wide marginsin time and space requirements.

The thesis establishes that the existing twig representation can be extendedby adding additional ordering and relationship constraints to accurately capturethe semantics of XPath queries with ordered and unordered axes. The resultingframework can compactly encode the queries and has brought in conceptual clarityand ease of representation for these queries. The framework also has led to thedevelopment of efficient algorithms for processing certain subsets of XPath queriescontaining ordered axes. The precision with which the meaning of ordered axes iscaptured in the framework and effective integration of the semantics of XPath inprocessing have indeed resulted in improved performance of the algorithms.