Partitioning XML data, towards distributed and parallel management

Noor Malla 1, 2
2 OAK - Database optimizations and architectures for complex large data
LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : With the widespread diffusion of XML as a format for representing data generated and exchanged over the Web, main query and update engines have been designed and implemented in the last decade. A kind of engines that are playing a crucial role in many applications are « main-memory » systems, which distinguish for the fact that they are easy to manage and to integrate in a programming environment. On the other hand, main-memory systems have scalability issues, as they load the entire document in main-memory before processing. This Thesis presents an XML partitioning technique that allows main-memory engines to process a class of XQuery expressions (queries and updates), that we dub « iterative », on arbitrarily large input documents. We provide a static analysis technique to recognize these expressions. The static analysis is based on paths extracted from the expression and does not need additional schema information. We provide algorithms using path information for partitioning the input documents, so that the query or update can be separately evaluated on each part in order to compute the final result. These algorithms admit a streaming implementation, whose effectiveness is experimentally validated. Besides enabling scalability, our approach is also characterized by the fact that it is easily implementable into a MapReduce framework, thus enabling parallel query/update evaluation on the partitioned data.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00759173
Contributor : Abes Star <>
Submitted on : Tuesday, December 8, 2015 - 4:25:08 PM
Last modification on : Monday, May 28, 2018 - 2:38:02 PM
Long-term archiving on : Saturday, April 29, 2017 - 10:03:09 AM

File

VD2_MALLA_NOOR_21092012.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-00759173, version 3

Collections

Citation

Noor Malla. Partitioning XML data, towards distributed and parallel management. Other [cs.OH]. Université Paris Sud - Paris XI, 2012. English. ⟨NNT : 2012PA112154⟩. ⟨tel-00759173v3⟩

Share

Metrics

Record views

295

Files downloads

109