KerA: A Unified Ingestion and Storage System for Scalable Big Data Processing

Ovidiu-Cristian Marcu 1
1 KerData - Scalable Storage for Clouds and Beyond
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : Big Data is now the new natural resource. Current state-of-the-art Big Data analytics architectures are built on top of a three layer stack: data streams are first acquired by the ingestion layer (e.g., Kafka) and then they flow through the processing layer (e.g., Flink) which relies on the storage layer (e.g., HDFS) for storing aggregated data or for archiving streams for later processing. Unfortunately, in spite of potential benefits brought by specialized layers (e.g., simplified implementation), moving large quantities of data through specialized layers is not efficient: instead, data should be acquired, processed and stored while minimizing the number of copies. This dissertation argues that a plausible path to follow to alleviate from previous limitations is the careful design and implementation of a unified architecture for stream ingestion and storage which can lead to the optimization of the processing of Big Data applications. This approach minimizes data movement within the analytics architecture, finally leading to better utilized resources. We identify a set of requirements for a dedicated stream ingestion/storage engine. We explain the impact of the different Big Data architectural choices on end-to-end performance. We propose a set of design principles for a scalable, unified architecture for data ingestion and storage. We implement and evaluate the KerA prototype with the goal of efficiently handling diverse access patterns: low-latency access to streams and/or high throughput access to unbounded streams and/or objects.
Complete list of metadatas

Cited literature [133 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01972280
Contributor : Ovidiu-Cristian Marcu <>
Submitted on : Monday, January 7, 2019 - 4:35:11 PM
Last modification on : Tuesday, February 25, 2020 - 8:08:10 AM
Long-term archiving on: Monday, April 8, 2019 - 5:40:27 PM

File

thesis.pdf
Files produced by the author(s)

Licence


Copyright

Identifiers

  • HAL Id : tel-01972280, version 1

Citation

Ovidiu-Cristian Marcu. KerA: A Unified Ingestion and Storage System for Scalable Big Data Processing. Distributed, Parallel, and Cluster Computing [cs.DC]. INSA Rennes, 2018. English. ⟨tel-01972280⟩

Share

Metrics

Record views

846

Files downloads

961