Efficient Big Data Processing on Large-Scale Shared Platforms: Managing I/Os and Failures

Orcun Yildiz 1
1 KerData - Scalable Storage for Clouds and Beyond
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : As of 2017, we live in a data-driven world where data-intensive applications are bringing fundamental improvements to our lives in many different areas such as business, science, health care and security. This has boosted the growth of the data volumes (i.e., deluge of Big Data). To extract useful information from this huge amount of data, different data processing frameworks have been emerging such as MapReduce, Hadoop, and Spark. Traditionally, these frameworks run on large- scale platforms (i.e., HPC systems and clouds) to leverage their computation and storage power. Usually, these large- scale platforms are used concurrently by multiple users and multiple applications with the goal of better utilization of resources. Though benefits of sharing these platforms exist, several challenges are raised when sharing these large-scale platforms, among which I/O and failure management are the major ones that can impact efficient data processing. To this end, we first focus on I/O related performance bottlenecks for Big Data applications on HPC systems. We start by characterizing the performance of Big Data applications on these systems. We identify I/O interference and latency as the major performance bottlenecks. Next, we zoom in on I/O interference problem to further understand the root causes of this phenomenon. Then, we propose an I/O management scheme to mitigate the high latencies that Big Data applications may encounter on HPC systems. Moreover, we introduce interference models for Big Data and HPC applications based on the findings we obtain in our experimental study regarding the root causes of I/O interference. Finally, we leverage these models to minimize the impact of interference on the performance of Big Data and HPC applications. Second, we focus on the impact of failures on the performance of Big Data applications by studying failure handling in shared MapReduce clusters. We introduce a failure-aware scheduler which enables fast failure recovery while optimizing data locality thus improving the application performance.
Complete list of metadatas

Cited literature [151 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01671413
Contributor : Orçun Yildiz <>
Submitted on : Friday, December 22, 2017 - 11:25:36 AM
Last modification on : Friday, September 13, 2019 - 9:51:33 AM

File

thesis.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01671413, version 1

Citation

Orcun Yildiz. Efficient Big Data Processing on Large-Scale Shared Platforms: Managing I/Os and Failures. Distributed, Parallel, and Cluster Computing [cs.DC]. ENS Rennes, 2017. English. ⟨tel-01671413⟩

Share

Metrics

Record views

1809

Files downloads

976