Skip to Main content Skip to Navigation

Efficient Big Data Processing on Large-Scale Shared Platforms: managing I/Os and Failure

Abstract : As of 2017, we live in a data-driven world where data-intensive applications are bringing fundamental improvements to our lives in many different areas such as business, science, health care and security. This has boosted the growth of the data volumes (i.e., deluge of Big Data). To extract useful information from this huge amount of data, different data processing frameworks have been emerging such as MapReduce, Hadoop, and Spark. Traditionally, these frameworks run on largescale platforms (i.e., HPC systems and clouds) to leverage their computation and storage power. Usually, these largescale platforms are used concurrently by multiple users and multiple applications with the goal of better utilization of resources. Though benefits of sharing these platforms exist, several challenges are raised when sharing these large-scale platforms, among which I/O and failure management are the major ones that can impact efficient data processing.To this end, we first focus on I/O related performance bottlenecks for Big Data applications on HPC systems. We start by characterizing the performance of Big Data applications on these systems. We identify I/O interference and latency as the major performance bottlenecks. Next, we zoom in on I/O interference problem to further understand the root causes of this phenomenon. Then, we propose an I/O management scheme to mitigate the high latencies that Big Data applications may encounter on HPC systems. Moreover, we introduce interference models for Big Data and HPC applications based on the findings we obtain in our experimental study regarding the root causes of I/O interference. Finally, we leverage these models to minimize the impact of interference on the performance of Big Data and HPC applications. Second, we focus on the impact of failures on the performance of Big Data applications by studying failure handling in shared MapReduce clusters. We introduce a failure-aware scheduler which enables fast failure recovery while optimizing data locality thus improving the application performance.
Document type :
Complete list of metadata

Cited literature [151 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Monday, March 5, 2018 - 6:06:07 PM
Last modification on : Thursday, September 1, 2022 - 4:06:19 AM
Long-term archiving on: : Wednesday, June 6, 2018 - 4:48:22 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01723850, version 2


Orcun Yildiz. Efficient Big Data Processing on Large-Scale Shared Platforms: managing I/Os and Failure. Performance [cs.PF]. École normale supérieure de Rennes, 2017. English. ⟨NNT : 2017ENSR0009⟩. ⟨tel-01723850v2⟩



Record views


Files downloads