Skip to Main content Skip to Navigation

On the foundations for the compilation of web data queries : optimization and distributed evaluation of SPARQL

Abstract : The topic of my PhD is the compilation of web data query languages. More particularly, the analysisand the distributed evaluation of a such language: SPARQL. My main contributions concern theevaluation of web data queries especially for recursive queries or for distributed settings.In this thesis, I introduce μ-algebra: it is a kind of relational algebra equipped with a fixpointoperator. I present its syntax, semantics, and a translation from SPARQL with Property Paths (anew feature of SPARQL allowing some form of recursion) to this μ-algebra.I then present a type system and show how μ-algebra terms can be rewritten to terms withequivalent semantics using either classical rewrite rules of the relational world or new rules that arespecific to this μ-algebra. We demonstrate the correctness of these new rules that are introduced tohandle the rewriting of fixpoints: they allow to push filters, joins and projections inside fixpointsor to combine several fixpoints (when some condition holds).I demonstrate how these terms could be evaluated both from a general perspective and in thespecific case of a distributed evaluation. I devise a cost model for μ-algebra terms inspired by thisevaluation. With this cost model and this evaluator, several terms that are semantically equivalentcan be seen as various Query Execution Plans (QEP) for a given query. I show that the μ-algebraand its rewrite rules allow the reach of QEP that are more efficient than all QEP considered in otherexisting approaches and confirm this by an experimental comparison of several query evaluators onSPARQL queries with recursion.I investigate the use of an efficient distributed framework (Spark) to build a fast SPARQL dis-tributed query evaluator. It is based on a fragment of μ-algebra, limited to operators that havea translation into fast Spark code. The result of this has been used to implement SPARQLGX, astate of the art distributed SPARQL query evaluator.Finally, my last contribution concerns the estimation of the cardinality of solutions to a μ-algebraterm. Such estimators are key in the optimization. Indeed, most cost models for QEP rely on suchestimators and are therefore necessary to determine the most efficient QEP. I specifically considerthe conjunctive query fragment of μ-algebra (which corresponds to the well-known Basic GraphPattern fragment of SPARQL). I propose a new cardinality estimation based on statistics about thedata and implemented the method into SPARQLGX. Experiments show that this method improvesthe performance of SPARQLGX.
Document type :
Complete list of metadatas

Cited literature [168 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Wednesday, December 19, 2018 - 11:43:06 AM
Last modification on : Monday, May 18, 2020 - 10:58:23 PM
Long-term archiving on: : Wednesday, March 20, 2019 - 6:35:08 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01960209, version 1



Louis Jachiet. On the foundations for the compilation of web data queries : optimization and distributed evaluation of SPARQL. Information Retrieval [cs.IR]. Université Grenoble Alpes, 2018. English. ⟨NNT : 2018GREAM038⟩. ⟨tel-01960209⟩



Record views


Files downloads