Efficient Processing of Ordered XML Twig Pattern
Jiaheng Lu, Tok Wang Ling, Tian Yu, Changqing Li, Wei Ni
School of computing, National University of Singapore
{lujiahen,lingtw,yutian,lichangq,niwei}@comp.nus.edu.sg
Abstract. Finding all the occurrences of a twig pattern in an XML database is
a core operation for efficient evaluation of XML queries. Holistic twig join al-
gorithm has showed its superiority over binary decompose based approach due
to efficient reducing intermediate results. The existing holistic join algorithms,
however, cannot deal with ordered twig queries. A straightforward approach
that first matches the unordered twig queries and then prunes away the unde-
sired answers is obviously not optimal in most cases. In this paper, we study a
novel holistic-processing algorithm, called OrderedTJ, for ordered twig queries.
We show that OrderedTJ can identify a large query class to guarantee the I/O
optimality. Finally, our experiments show the effectiveness, scalability and ef-
ficiency of our proposed algorithm.
1 Introduction
With the rapidly increasing popularity of XML for data representation, there is a lot
of interest in query processing over data that conforms to a tree-structured data
model([1],[5]). Efficient finding all twig patterns in an XML database is a major
concern of XML query processing. Recently, holistic twig join approach has been
taken as an efficient way to match twig pattern since this approach can efficiently
control the size of intermediate results([1],[2],[3],[4]). We observe that, however, the
existing work on holistic twig query matching only considered unordered twig que-
ries. But XPath defines four ordered axes: following-sibling, preceding-sibling, fol-
lowing, preceding. For example, XPath: //book/text/following-sibling::chapter is an
ordered query, which finds all chapters in the dataset that are following siblings of
text which should be a child of book.
We call a twig query which cares the order of the matching elements as an or-
dered twig query. On the other hand, we denote a twig query that does not consider
the order of matching elements as an unordered query. In this paper, we research how
to efficiently evaluate an ordered twig query.
To handle an ordered twig query, naively, we can use the existing algorithm (e.g.
TwigStack[1]/TwigStackList[5]) to output the intermediate path solutions for each
individual root-leaf query path, and then merge path solutions so that the final solu-
tions are guaranteed to satisfy the order predicates of the query. Although existing
algorithms are applied, such a post-processing approach has a serious disadvantage:
many intermediate results may not contribute to final answers.
评论6