Algorithmes de traitement de flux XML : masses de données, mémoire externe et performances extensibles

Abstract : Many modern applications require processing of massive streams of XML data, creating difficult technical challenges. Among these, there is the design and implementation of applications to optimize the processing of XPath queries and to provide an accurate cost estimation for these queries processed on a massive steam of XML data. In this thesis, we propose a novel performance prediction model which a priori estimates the cost (in terms of space used and time spent) for any structural query belonging to Forward XPath. In doing so, we perform an experimental study to confirm the linear relationship between stream-processing and data-access resources. Therefore, we introduce a mathematical model (linear regression functions) to predict the cost for a given XPath query. Moreover, we introduce a new selectivity estimation technique. It consists of two elements. The first one is the path tree structure synopsis: a concise, accurate, and convenient summary of the structure of an XML document. The second one is the selectivity estimation algorithm: an efficient stream-querying algorithm to traverse the path tree synopsis for estimating the values of cost-parameters. Those parameters are used by the mathematical model to determine the cost of a given XPath query. We compare the performance of our model with existing approaches. Furthermore, we present a use case for an online stream-querying system. The system uses our performance predicate model to estimate the cost for a given XPath query in terms of time/memory. Moreover, it provides an accurate answer for the query's sender. This use case illustrates the practical advantages of performance management with our techniques
Document type :
Theses
Complete list of metadatas

Cited literature [83 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00779309
Contributor : Abes Star <>
Submitted on : Tuesday, January 22, 2013 - 9:02:10 AM
Last modification on : Wednesday, September 4, 2019 - 1:52:06 PM
Long-term archiving on : Saturday, April 1, 2017 - 8:07:02 AM

File

TH2011PEST1002_complete.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-00779309, version 1

Collections

Citation

Muath Alrammal. Algorithmes de traitement de flux XML : masses de données, mémoire externe et performances extensibles. Other [cs.OH]. Université Paris-Est, 2011. English. ⟨NNT : 2011PEST1002⟩. ⟨tel-00779309⟩

Share

Metrics

Record views

618

Files downloads

335