StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine

Vincenzo Gulisano 1, 2
2 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : In recent years, applications in domains such as telecommunications, network security or large scale sensor networks showed the limits of the traditional store-then-process paradigm. In this context, Stream Processing Engines emerged as a candidate solution for all these applications demanding for high processing capacity with low processing latency guarantees. With Stream Processing Engines, data streams are not persisted but rather processed on the fly, producing results continuously. Current Stream Processing Engines, either centralized or distributed, do not scale with the input load due to single-node bottlenecks. Moreover, they are based on static configurations that lead to either under or over-provisioning. This Ph.D. thesis discusses StreamCloud, an elastic paralleldistributed stream processing engine that enables for processing of large data stream volumes. Stream- Cloud minimizes the distribution and parallelization overhead introducing novel techniques that split queries into parallel subqueries and allocate them to independent sets of nodes. Moreover, Stream- Cloud elastic and dynamic load balancing protocols enable for effective adjustment of resources depending on the incoming load. Together with the parallelization and elasticity techniques, Stream- Cloud defines a novel fault tolerance protocol that introduces minimal overhead while providing fast recovery. StreamCloud has been fully implemented and evaluated using several real word applications such as fraud detection applications or network analysis applications. The evaluation, conducted using a cluster with more than 300 cores, demonstrates the large scalability, the elasticity and fault tolerance effectiveness of StreamCloud.
Complete list of metadatas

Cited literature [60 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00768281
Contributor : Patrick Valduriez <>
Submitted on : Friday, December 21, 2012 - 10:30:35 AM
Last modification on : Thursday, May 24, 2018 - 3:59:21 PM
Long-term archiving on : Friday, March 22, 2013 - 3:46:34 AM

Identifiers

  • HAL Id : tel-00768281, version 1

Collections

Citation

Vincenzo Gulisano. StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine. Distributed, Parallel, and Cluster Computing [cs.DC]. Universidad Politécnica de Madrid, 2012. English. ⟨tel-00768281⟩

Share

Metrics

Record views

720

Files downloads

910