Query Processing in Multistore Systems

Carlyna Bondiombouy 1
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Cloud computing is having a major impact on data management, with a proliferation of new, scalable data management solutions such as distributed file and object storage, NoSQL databases and big data processing frameworks. This also leads to a wide diversification of DBMS interfaces and the loss of a common programming paradigm, making it very hard for a user to integrate its data sitting in specialized data stores, e.g. relational, documents and graph data stores.In this thesis, we address the problem of query processing with multiple cloud data stores, where the data stores have different models, languages and APIs. This thesis has been prepared in the context of the CoherentPaaS European project and, in particular, the CloudMdsQL multistore system. CloudMdsQL is a functional query language able to exploit the full power of local data stores, by simply allowing some local data store native queries to be called as functions, and at the same time be optimized, e.g. by pushing down select predicates, using bind join, performing join ordering, or planning intermediate data shipping.In this thesis, we propose an extension of CloudMdsQL to take full advantage of the functionality of the underlying data processing frameworks such as Spark by allowing the ad-hoc usage of user defined map/filter/reduce (MFR) operators in combination with traditional SQL statements. This allows performing joins between relational and HDFS big data. Our solution allows for optimization by enabling subquery rewriting so that bind join can be used and filter conditions can be pushed down and applied by the data processing framework as early as possible.We validated our solution by implementing the MFR extension as part of the CloudMdsQL query engine. Based on this prototype, we provide an experimental validation of multistore query processing in a cluster to evaluate the impact on performance of optimization. More specifically, we explore the performance benefit of using bind join and select pushdown under different conditions. Overall, our performance evaluation illustrates the CloudMdsQL query engine’s ability to optimize a query and choose the most efficient execution strategy.
Document type :
Theses
Complete list of metadatas

Cited literature [77 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01935268
Contributor : Abes Star <>
Submitted on : Monday, November 26, 2018 - 3:35:08 PM
Last modification on : Thursday, January 31, 2019 - 9:57:59 PM
Long-term archiving on: Wednesday, February 27, 2019 - 2:42:25 PM

File

BONDIOMBOUY_2017_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01935268, version 1

Collections

Citation

Carlyna Bondiombouy. Query Processing in Multistore Systems. Other [cs.OH]. Université Montpellier, 2017. English. ⟨NNT : 2017MONTS056⟩. ⟨tel-01935268⟩

Share

Metrics

Record views

146

Files downloads

294