Skip to Main content Skip to Navigation
Theses

Efficient support for data-intensive scientific workflows on geo-distributed clouds

Abstract : By 2020, the digital universe is expected to reach 44 zettabytes, as it is doubling every two years. Data come in the most diverse shapes and from the most geographically dispersed sources ever. The data explosion calls for applications capable of highlyscalable, distributed computation, and for infrastructures with massive storage and processing power to support them. These large-scale applications are often expressed as workflows that help defining data dependencies between their different components. More and more scientific workflows are executed on clouds, for they are a cost-effective alternative for intensive computing. Sometimes, workflows must be executed across multiple geodistributed cloud datacenters. It is either because these workflows exceed a single site capacity due to their huge storage and computation requirements, or because the data they process is scattered in different locations. Multisite workflow execution brings about several issues, for which little support has been developed: there is no common ile system for data transfer, inter-site latencies are high, and centralized management becomes a bottleneck. This thesis consists of three contributions towards bridging the gap between single- and multisite workflow execution. First, we present several design strategies to eficiently support the execution of workflow engines across multisite clouds, by reducing the cost of metadata operations. Then, we take one step further and explain how selective handling of metadata, classified by frequency of access, improves workflows performance in a multisite environment. Finally, we look into a different approach to optimize cloud workflow execution by studying some parameters to model and steer elastic scaling.
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-01645434
Contributor : Abes Star :  Contact
Submitted on : Thursday, November 23, 2017 - 9:37:52 AM
Last modification on : Tuesday, February 25, 2020 - 8:08:10 AM

File

These_DEF_PINEDA_Luis_pdfstar....
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01645434, version 1

Citation

Luis Eduardo Pineda Morales. Efficient support for data-intensive scientific workflows on geo-distributed clouds. Computation and Language [cs.CL]. INSA de Rennes, 2017. English. ⟨NNT : 2017ISAR0012⟩. ⟨tel-01645434v1⟩

Share

Metrics

Record views

154

Files downloads

93