Skip to Main content Skip to Navigation

Scalable and Efficient Data Management in Distributed Clouds : Service Provisioning and Data Processing

Jad Darrous 1, 2, 3 
2 AVALON - Algorithms and Software Architectures for Distributed and HPC Platforms
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
3 STACK - Software Stack for Massively Geo-Distributed Infrastructures
Inria Rennes – Bretagne Atlantique , LS2N - Laboratoire des Sciences du Numérique de Nantes
Abstract : This thesis focuses on scalable data management solutions to accelerate service provisioning and enable efficient execution of data-intensive applications in large-scale distributed clouds. Data-intensive applications are increasingly running on distributed infrastructures (multiple clusters). The main two reasons for such a trend are 1) moving computation to data sources can eliminate the latency of data transmission, and 2) storing data on one site may not be feasible given the continuous increase of data size.On the one hand, most applications run on virtual clusters to provide isolated services, and require virtual machine images (VMIs) or container images to provision such services. Hence, it is important to enable fast provisioning of virtualization services to reduce the waiting time of new running services or applications. Different from previous work, during the first part of this thesis, we worked on optimizing data retrieval and placement considering challenging issues including the continuous increase of the number and size of VMIs and container images, and the limited bandwidth and heterogeneity of the wide area network (WAN) connections.On the other hand, data-intensive applications rely on replication to provide dependable and fast services, but it became expensive and even infeasible with the unprecedented growth of data size. The second part of this thesis provides one of the first studies on understanding and improving the performance of data-intensive applications when replacing replication with the storage-efficient erasure coding (EC) technique.
Complete list of metadata

Cited literature [331 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Sunday, March 15, 2020 - 1:01:51 AM
Last modification on : Thursday, May 12, 2022 - 5:08:02 PM
Long-term archiving on: : Tuesday, June 16, 2020 - 6:25:42 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02508592, version 1


Jad Darrous. Scalable and Efficient Data Management in Distributed Clouds : Service Provisioning and Data Processing. Distributed, Parallel, and Cluster Computing [cs.DC]. Université de Lyon, 2019. English. ⟨NNT : 2019LYSEN077⟩. ⟨tel-02508592⟩



Record views


Files downloads