Skip to Main content Skip to Navigation

Efficient support for data-intensive scientific workflows on geo-distributed clouds

Abstract : By 2020, the digital universe is expected to reach 44 zettabytes, as it is doubling every two years. Data come in the most diverse shapes and from the most geographically dispersed sources ever. The data explosion calls for applications capable of highlyscalable, distributed computation, and for infrastructures with massive storage and processing power to support them. These large-scale applications are often expressed as workflows that help defining data dependencies between their different components. More and more scientific workflows are executed on clouds, for they are a cost-effective alternative for intensive computing. Sometimes, workflows must be executed across multiple geodistributed cloud datacenters. It is either because these workflows exceed a single site capacity due to their huge storage and computation requirements, or because the data they process is scattered in different locations. Multisite workflow execution brings about several issues, for which little support has been developed: there is no common ile system for data transfer, inter-site latencies are high, and centralized management becomes a bottleneck. This thesis consists of three contributions towards bridging the gap between single- and multisite workflow execution. First, we present several design strategies to eficiently support the execution of workflow engines across multisite clouds, by reducing the cost of metadata operations. Then, we take one step further and explain how selective handling of metadata, classified by frequency of access, improves workflows performance in a multisite environment. Finally, we look into a different approach to optimize cloud workflow execution by studying some parameters to model and steer elastic scaling.
Document type :
Complete list of metadatas
Contributor : Abes Star :  Contact
Submitted on : Wednesday, December 13, 2017 - 10:41:36 AM
Last modification on : Wednesday, June 24, 2020 - 4:19:45 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01645434, version 2


Luis Eduardo Pineda Morales. Efficient support for data-intensive scientific workflows on geo-distributed clouds. Computation and Language [cs.CL]. INSA de Rennes, 2017. English. ⟨NNT : 2017ISAR0012⟩. ⟨tel-01645434v2⟩



Record views


Files downloads