, We start this section by presenting our evaluation setup

, Intel Xeon X5570 CPUs (4 cores per CPU), 24GB of RAM and a 465GB HDD. The machines are running the Debian 8 GNU/Linux operating system

, We used the industry standard Yahoo! Cloud Serving Benchmark (YCSB) [32] to generate datasets and run our workloads. As YCSB only generates a single value size datasets for each given client, we modified its source code to allow generation of mixed size datasets. Specifically, for mixed size workloads, we kept the proportion of large values compared to small values the same. For generating client workloads

. Moreover, This approximately represents41GB of data per node. We kept the replication factor as 3 which means each piece of value is available on 3 servers. Each measurement involves 1 million or 10 million requests and is repeated 5 times. Each multiget request access various operations with different value sizes, all the generated workloads, the access pattern of stored values (whether small or large) follows a Zipfian distribution (with a Zipfian parameter ?=0.99)

L. Suresh, M. Canini, S. Schmid, and A. Feldmann, C3: Cutting tail latency in cloud data stores via adaptive replica selection, NSDI, 2015.

P. Delgado, F. Dinu, A. Kermarrec, and W. Zwaenepoel, Hawk: Hybrid datacenter scheduling, USENIX ATC, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01183857

E. Schurman and J. Brutlag, Performance related changes and their user impact, Velocity: web performance and operations conference, 2009.

J. Brutlag, Speed matters for google web search, 2009.

J. Dean and L. A. Barroso, The tail at scale, Communications of the ACM, 2013.

G. Linden, Make data useful

V. Jalaparti, P. Bodik, S. Kandula, I. Menache, M. Rybalkin et al., Speeding up distributed request-response workflows, SIGCOMM, 2013.

Y. Xu, Z. Musgrave, B. Noble, and M. Bailey, Bobtail: Avoiding long tails in the cloud, NSDI, 2013.

D. S. Berger, B. Berg, T. Zhu, S. Sen, and M. Harchol-balter, Robinhood: Tail latency aware caching -dynamic reallocation from cache-rich to cache-poor, OSDI, 2018.

K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica, Sparrow: Distributed, low latency scheduling, SOSP, 2013.

T. Zhu, A. Tumanov, M. A. Kozuch, M. Harchol-balter, and G. R. Ganger, Prioritymeister: Tail latency qos for shared networked storage, 2014.

C. Stewart, A. Chakrabarti, and R. Griffith, Zoolander: Efficiently meeting very strict, low-latency SLOs, ICAC, 2013.

A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca, Jockey: Guaranteed job latency in data parallel clusters, 2012.

M. E. Haque, Y. H. Eom, Y. He, S. Elnikety, R. Bianchini et al., Few-to-many: Incremental parallelism for reducing tail latency in interactive services, ASPLOS, 2015.

M. Jeon, S. Kim, S. Hwang, Y. He, S. Elnikety et al., Predictive parallelization: Taming tail latencies in web search, SIGIR, 2014.

W. Reda, M. Canini, L. Suresh, D. Kosti?, and S. Braithwaite, Rein: Taming tail latency in key-value stores via multiget scheduling, 2017.

A. Lakshman and P. Malik, Cassandra: A decentralized structured storage system, SIGOPS Oper. Syst. Rev, 2010.

B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny, Workload analysis of a large-scale key-value store, SIGMETRICS, 2012.

F. R. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron, Decentralized taskaware scheduling for data center networks, SIGCOMM, 2014.

B. Williams, Dynamic snitching in Cassandra: past, present, and future, 2012.

, CloudComputing Survey, 2018.

, Partitioners

, Mongodb

, Openstack swift

, Apache accumulo

, Riak Load Balancing and Proxy Configuration

, Wikimedia downloads

M. J. Huiskes and M. S. Lew, The MIR Flickr retrieval evaluation, 2008.

M. Ould-khaoua, G. Min, and N. Thomas, Performance analysis and evaluation of parallel, cluster, and grid computing systems comparing job allocation schemes where service demand is unknown, Journal of Computer and System Sciences, 2008.

V. Jaiman, S. B. Mokhtar, V. Quéma, L. Y. Chen, and E. Rivière, Héron: Taming tail latencies in key-value stores under heterogeneous workloads, SRDS, 2018.

D. Balouek, A. Amarie, G. Charrier, F. Desprez, E. Jeannot et al.,

F. Quesnel, C. Rohr, and L. Sarzyniec, Adding virtualization capabilities to the Grid'5000 testbed, Cloud Computing and Services Science, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00946971

B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, Benchmarking cloud serving systems with YCSB, 2010.

R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee et al., Scaling Memcache at Facebook, NSDI, 2013.

P. Delgado, D. Didona, F. Dinu, and W. Zwaenepoel, Job-aware scheduling in Eagle: Divide and stick to your probes, 2016.

J. Lenstra, A. R. Kan, and P. Brucker, Complexity of machine scheduling problems, Studies in Integer Programming, 1977.

B. H. Bloom, Space/time trade-offs in hash coding with allowable errors, Communications of the ACM, 1970.

F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, An improved construction for counting bloom filters, European Symposium on Algorithms, 2009.

P. Pandey, M. A. Bender, R. Johnson, and R. Patro, A general-purpose counting filter: Making every bit count, SIGMOD, 2017.

F. Hao, M. Kodialam, and T. V. Lakshman, Incremental bloom filters, INFO-COM, 2008.

S. Ha, I. Rhee, and L. Xu, Cubic: A new tcp-friendly high-speed tcp variant, SIGOPS Oper. Syst. Rev, 2008.

K. Bogdanov, M. Peón-quirós, G. Q. Maguire, J. , and D. Kosti?, The nearest replica can be farther than you think, 2015.

D. Shue, M. J. Freedman, and A. Shaikh, Performance isolation and fairness for multi-tenant cloud storage, OSDI, 2012.

Z. Wu, C. Yu, and H. V. Madhyastha, Costlo: Cost-effective redundancy for lower latency variance on cloud storage services, NSDI, 2015.

C. R. Lumb, R. Golding, and G. R. Ganger, D-SPTF: Decentralized request distribution in brick-based storage systems, ASPLOS, 2004.

G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu et al., Reining in the outliers in map-reduce clusters using mantri, OSDI, 2010.

J. Li, N. K. Sharma, D. R. Ports, and S. D. Gribble, Tales of the tail: Hardware, OS, and application-level sources of tail latency, 2014.

A. Gulati, I. Ahmad, and C. A. Waldspurger, PARDA: Proportional allocation of resources for distributed storage access, FAST, 2009.

A. Gulati, A. Merchant, and P. J. Varman, mclock: Handling throughput variability for hypervisor io scheduling, OSDI, 2010.

A. Wang, S. Venkataraman, S. Alspaugh, R. Katz, and I. Stoica, Cake: Enabling high-level slos on shared storage systems, 2012.

H. Lim, D. Han, D. G. Andersen, and M. Kaminsky, Mica: A holistic approach to fast in-memory key-value storage, NSDI, 2014.

B. Fan, D. G. Andersen, and M. Kaminsky, Memc3: Compact and concurrent memcache with dumber caching and smarter hashing, NSDI, 2013.

G. Ananthanarayanan, A. Ghodsi, A. Warfield, D. Borthakur, S. Kandula et al., Pacman: Coordinated memory caching for parallel jobs, NSDI, 2012.

M. Chowdhury, Y. Zhong, and I. Stoica, Efficient coflow scheduling with varys, SIGCOMM, 2014.

R. Motwani, S. Phillips, and E. Torng, Non-clairvoyant scheduling, Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, ser. SODA, 1993.

M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica, Managing data transfers in computer clusters with orchestra, SIGCOMM, 2011.

M. Chowdhury and I. Stoica, Efficient coflow scheduling without prior knowledge, SIGCOMM, 2015.

M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. Mckeown et al., pfabric: Minimal near-optimal datacenter transport, 2013.

A. Vulimiri, P. B. Godfrey, R. Mittal, J. Sherry, S. Ratnasamy et al., Low latency via redundancy, 2013.

M. Mitzenmacher, The power of two choices in randomized load balancing, IEEE Transactions on Parallel and Distributed Systems, 2001.

M. Schwarzkopf, A. Konwinski, M. Abd-el-malek, and J. Wilkes, Omega: Flexible, scalable schedulers for large compute clusters, EuroSys, 2013.

M. Jeon, Y. He, H. Kim, S. Elnikety, S. Rixner et al., TPC: Targetdriven parallelism combining prediction and correction to reduce tail latency in interactive services, ASPLOS, 2016.

M. Harchol-balter, B. Schroeder, N. Bansal, and M. , Size-based scheduling to improve web performance, 2003.