Improving Performance Predictability in Cloud Data Stores

Abstract : Today, users of interactive services such as e-commerce, web search have increasingly high expectations on the performance and responsiveness of these services. Indeed, studies have shown that a slow service (even for short periods of time) directly impacts the revenue. Enforcing predictable performance has thus been a priority of major service providers in the last decade. But avoiding latency variability in distributed storage systems is challenging since end user requests go through hundreds of servers and performance hiccups at any of these servers may inflate the observed latency. Even in well-provisioned systems, factors such as the contention on shared resources or the unbalanced load between servers affect the latencies of requests and in particular the tail (95th and 99th percentile) of their distribution.The goal of this thesis to develop mechanisms for reducing latencies and achieve performance predictability in cloud data stores. One effective countermeasure for reducing tail latency in cloud data stores is to provide efficient replica selection algorithms. In replica selection, a request attempting to access a given piece of data (also called value) identified by a unique key is directed to the presumably best replica. However, under heterogeneous workloads, these algorithms lead to increased latencies for requests with a short execution time that get scheduled behind requests with large execution times. We propose Héron, a replica selection algorithm that supports workloads of heterogeneous request execution times. We evaluate Héron in a cluster of machines using a synthetic dataset inspired from the Facebook dataset as well as two real datasets from Flickr and WikiMedia. Our results show that Héron outperforms state-of-the-art algorithms by reducing both median and tail latency by up to 41%.In the second contribution of the thesis, we focus on multiget workloads to reduce the latency in cloud data stores. The challenge is to estimate the bottleneck operations and schedule them on uncoordinated backend servers with minimal overhead. To reach this objective, we present TailX, a task aware multiget scheduling algorithm that reduces tail latencies under heterogeneous workloads. We implement TailX in Cassandra, a widely used key-value store. The result is an improved overall performance of the cloud data stores for a wide variety of heterogeneous workloads.
Document type :
Theses
Complete list of metadatas

Cited literature [67 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02301338
Contributor : Abes Star <>
Submitted on : Monday, September 30, 2019 - 1:21:07 PM
Last modification on : Friday, October 25, 2019 - 1:25:24 AM

File

JAIMAN_2019_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02301338, version 1

Collections

Citation

Vikas Jaiman. Improving Performance Predictability in Cloud Data Stores. Machine Learning [cs.LG]. Université Grenoble Alpes, 2019. English. ⟨NNT : 2019GREAM016⟩. ⟨tel-02301338⟩

Share

Metrics

Record views

99

Files downloads

55