Skip to Main content Skip to Navigation
Theses

Streaming Tree Automata and XPath

Abstract : During the last years, XML has evolved into the quasi standard format for data exchange. Most typically, XML documents are produced from databases, during document processing, and for Web applications. Streaming is a natural exchange mode, that is frequently used when sending large amounts of data over networks, such as in database driven Web applications. Streaming is thus relevant for many XML processing tasks.

In this thesis, we study streaming algorithms for XML query answering. Our main objective lies in efficient memory management, in order to be able to query huge data collections with low memory consumption. This turns out to be a surprisingly complex task, which requires serious restrictions on the query language. We therefore consider queries defined by deterministic automata or in fragments of the W3C standard language XPath, rather than studying more powerful languages such as the W3C standards XQuery or XSLT.

We first propose Streaming Tree Automata (STAs) that operate on unranked trees in streaming order, and prove them equivalent to Nested Word Automata and to Pushdown Forest Automata. We then contribute an earliest query answering algorithm for query defined by deterministic STAs. Even though it succeeds to store only alive answer candidates, it consumes only PTIME per event and candidate. This yields positive streamability results for classes of queries defined by deterministic STAs. The precise streamability notion here relies on a new machine model that we call Streaming Random Access Machines (SRAMs), and on the number of concurrently alive candidates of a query. We also show that bounded concurrency is decidable in PTIME for queries defined by deterministic STAs. Our proof is by reduction to bounded valuedness of recognizable tree relations.

Concerning the W3C standard query language XPath, we first show that small syntactic fragments are not streamable except if P=NP. The problematic features are non-determinism in combination with nesting of and/or operators. We define fragments of Forward XPath with schema assumptions that avoid these aspects and prove them streamable by PTIME compilation to deterministic STAs.
Document type :
Theses
Complete list of metadata

Cited literature [1 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00421911
Contributor : Olivier Gauwin Connect in order to contact the contributor
Submitted on : Wednesday, June 23, 2010 - 11:04:30 AM
Last modification on : Friday, October 23, 2020 - 4:45:26 PM
Long-term archiving on: : Friday, September 24, 2010 - 5:21:37 PM

Identifiers

  • HAL Id : tel-00421911, version 3

Collections

`

Citation

Olivier Gauwin. Streaming Tree Automata and XPath. Software Engineering [cs.SE]. Université des Sciences et Technologie de Lille - Lille I, 2009. English. ⟨tel-00421911v3⟩

Share

Metrics

Record views

521

Files downloads

910