Skip to Main content Skip to Navigation
Theses

Efficient Big Data Processing on Large-Scale Shared Platforms ˸ managing I/Os and Failure

Abstract : As of 2017, we live in a data-driven world where data-intensive applications are bringing fundamental improvements to our lives in many different areas such as business, science, health care and security. This has boosted the growth of the data volumes (i.e., deluge of Big Data). To extract useful information from this huge amount of data, different data processing frameworks have been emerging such as MapReduce, Hadoop, and Spark. Traditionally, these frameworks run on largescale platforms (i.e., HPC systems and clouds) to leverage their computation and storage power. Usually, these largescale platforms are used concurrently by multiple users and multiple applications with the goal of better utilization of resources. Though benefits of sharing these platforms exist, several challenges are raised when sharing these large-scale platforms, among which I/O and failure management are the major ones that can impact efficient data processing.To this end, we first focus on I/O related performance bottlenecks for Big Data applications on HPC systems. We start by characterizing the performance of Big Data applications on these systems. We identify I/O interference and latency as the major performance bottlenecks. Next, we zoom in on I/O interference problem to further understand the root causes of this phenomenon. Then, we propose an I/O management scheme to mitigate the high latencies that Big Data applications may encounter on HPC systems. Moreover, we introduce interference models for Big Data and HPC applications based on the findings we obtain in our experimental study regarding the root causes of I/O interference. Finally, we leverage these models to minimize the impact of interference on the performance of Big Data and HPC applications. Second, we focus on the impact of failures on the performance of Big Data applications by studying failure handling in shared MapReduce clusters. We introduce a failure-aware scheduler which enables fast failure recovery while optimizing data locality thus improving the application performance.
Complete list of metadatas

Cited literature [151 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01723850
Contributor : Abes Star :  Contact
Submitted on : Monday, March 5, 2018 - 6:06:07 PM
Last modification on : Thursday, February 27, 2020 - 1:09:34 AM
Document(s) archivé(s) le : Wednesday, June 6, 2018 - 4:48:22 PM

File

THESE_YILDIZ_Orcun.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01723850, version 1

Citation

Orcun Yildiz. Efficient Big Data Processing on Large-Scale Shared Platforms ˸ managing I/Os and Failure. Performance [cs.PF]. École normale supérieure de Rennes, 2017. English. ⟨NNT : 2017ENSR0009⟩. ⟨tel-01723850⟩

Share

Metrics

Record views

1678

Files downloads

416