Abstract of MS Thesis - O V Ravi Kumar

Friday 13 April 2012 at 01:08 am.

XML(eXtensible Markup Language) framework has emerged as a new standard forsemi-structured data representation and exchange over the web. XML documents canbe represented as rooted, ordered, node-labeled trees. Path expressions are used tonavigate and select nodes in XML documents. A subset of path expressions consistingof the parent-child and ancestor-descendant axes and node predicates, can be representedusing tree structures known as twig queries. Holistic processing of twig querieson a given set of XML documents is considered as a core operation in XML databaseresearch. Sequence based XML query processing has recently gained importance asa method to process twig queries holistically. In this approach, the document andquery are converted into sequences of some form and sub-sequence matching is usedas the fundamental operation for query matching.We propose a new way of indexing and storing XML data and using the indexeddata to process the twig queries for ordered matches. We convert an XML documentinto what we call a pre-order sequence(POS) by traversing XML tree in the pre-orderfashion. These sequences are indexed for ecient query processing. At the time ofquery processing, a twig tree is also transformed into a sequence. Unlike the previoussequence based approaches, we use dierent types of sequences to encode documentand query. By using our new query sequencing method, we do not need to retrievedata corresponding to branch nodes in the twig repeatedly. Using the proposedapproach, we find that it is possible to holistically process twig queries and obtainsubstantial performance improvements. Simplicity of representation, reporting all thematches without any false positives or false negatives, eliminating the need for postprocessingand reduced storage requirements are the main features of the proposedsolution. We also present many optimizations for improving the performance ofthe basic query processing algorithm. We also show how to efficiently handle twigqueries having value predicates with non-equality operators. We also extend oursequencing and indexing mechanism to support updates without regenerating thewhole sequence every time an update occurs in an XML document.