F. Ahmadand, S. Lee, and M. Thottethodi, PUMA: Purdue MapReduce benchmarks suite, Purdue University, Tech. Rep, 2012.

H. Amur, J. Cipar, and V. Gupta, Robust and flexible power-proportional storage, Proceedings of the 1st ACM symposium on Cloud computing, SoCC '10, pp.217-228, 2010.
DOI : 10.1145/1807128.1807164

URL : http://www.pdl.cs.cmu.edu/PDL-FTP/Storage/rabbit.pdf

G. Ananthanarayanan, A. Ghodsi, and S. Shenker, Effective straggler mitigation: attack of the clones, USENIX Symposium on Networked Systems Design and Implementation (NSDI '13), pp.185-198, 2013.

G. Ananthanarayanan, M. C. Hung, and X. Ren, GRASS: trimming stragglers in approximation analytics, USENIX Symposium on Networked Systems Design and Implementation (NSDI '14), pp.289-302, 2014.

G. Ananthanarayanan, S. Kandula, and A. Greenberg, Reining in the outliers in MapReduce clusters using Mantri, USENIX Symposium on Operating Systems Design and Implementation (OSDI '10), pp.1-16, 2010.

D. P. Anderson, J. Cobb, and E. Korpela, SETI@home: an experiment in public-resource computing, Communications of the ACM, vol.45, issue.11, pp.56-61, 2002.
DOI : 10.1145/581571.581573

T. W. Anderson, T. W. Anderson, and T. W. Anderson, An introduction to multivariate statistical analysis, 1958.

C. Anglano, J. Brevik, and M. Canonico, Fault-aware scheduling for Bag-of-Tasks applications on Desktop Grids, 2006 7th IEEE/ACM International Conference on Grid Computing, pp.56-63, 2006.
DOI : 10.1109/ICGRID.2006.310998

C. Anglano and M. Canonico, Scheduling algorithms for multiple Bag-of-Task applications on Desktop Grids: A knowledge-free approach, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.1-8, 2008.
DOI : 10.1109/IPDPS.2008.4536445

M. Armbrust, A. Fox, and R. Griffith, A view of cloud computing, Communications of the ACM, vol.53, issue.4, pp.50-58, 2010.
DOI : 10.1145/1721654.1721672

M. J. Atallah, R. Cole, and M. T. Goodrich, Cascading Divide-and-Conquer: A Technique for Designing Parallel Algorithms, SIAM Journal on Computing, vol.18, issue.3, pp.499-532, 1989.
DOI : 10.1137/0218035

G. Aupy, Y. Robert, and F. Vivien, Checkpointing algorithms and fault prediction, Journal of Parallel and Distributed Computing, vol.74, issue.2, pp.2048-2064, 2014.
DOI : 10.1016/j.jpdc.2013.10.010

URL : https://hal.archives-ouvertes.fr/hal-00788313

A. T. Bates, Technology, e-learning and distance education. Routledge, 2005.
DOI : 10.4324/9780203463772

F. Bonomi, R. Milito, and J. Zhu, Fog computing and its role in the internet of things, Proceedings of the first edition of the MCC workshop on Mobile cloud computing, MCC '12, pp.13-16, 2012.
DOI : 10.1145/2342509.2342513

T. Bray, J. Paoli, and C. M. Sperberg-mcqueen, Extensible markup language (XML), World Wide Web Journal (WWW), vol.2, issue.4, pp.27-66, 1997.

D. J. Brown and C. Reams, Toward energy-efficient computing, Communications of the ACM, vol.53, issue.3, pp.50-58, 2010.
DOI : 10.1145/1666420.1666438

P. Buneman, S. Davidson, and G. Hillebrand, A query language and optimization techniques for unstructured data, ACM SIGMOD International Conference on Management of Data (SIGMOD '96), pp.505-516, 1996.
DOI : 10.1145/235968.233368

URL : http://wwwipd.ira.uka.de/~ggh/papers/BDHS96.ps.gz

P. Carbone, A. Katsifodimos, and S. Ewen, Apache Flink: stream and batch processing in a single engine, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol.36, issue.4, pp.28-38, 2015.

M. Cardosa, A. Singh, and H. Pucha, Exploiting spatio-temporal tradeoffs for energy-aware MapReduce in the cloud, IEEE International Conference on Cloud Computing (CLOUD '11), pp.251-258, 2011.

H. Chen, R. H. Chiang, and V. C. Storey, Business intelligence and analytics: from big data to big impact, Management Information Systems Quarterly (MISQ), vol.36, issue.4, pp.1165-1188, 2012.

Q. Chen, C. Liu, and Z. Xiao, Improving MapReduce Performance Using Smart Speculative Execution Strategy, IEEE Transactions on Computers, vol.63, issue.4, pp.29-42, 2014.
DOI : 10.1109/TC.2013.15

Y. Chen, S. Alspaugh, and D. Borthakur, Energy efficiency for large-scale MapReduce workloads with significant interactive analysis, Proceedings of the 7th ACM european conference on Computer Systems, EuroSys '12, pp.43-56, 2012.
DOI : 10.1145/2168836.2168842

URL : http://www.eecs.berkeley.edu/~ychen2/professional/eurosys2012paper125-cameraReady.pdf

Y. Chen, S. Alspaugh, and R. Katz, Interactive analytical processing in big data systems, Proceedings of the VLDB Endowment (VLDB), pp.1802-1813, 2012.
DOI : 10.14778/2367502.2367519

Y. Chen, A. Ganapathi, and R. H. Katz, To compress or not to compress - compute vs. IO tradeoffs for mapreduce energy efficiency, Proceedings of the first ACM SIGCOMM workshop on Green networking, Green Networking '10, pp.23-28, 2010.
DOI : 10.1145/1851290.1851296

Y. Chen, L. Keys, and R. H. Katz, Towards energy efficient MapReduce, EECS Department, 2009.

D. Cheng, C. Jiang, and X. Zhou, Heterogeneity-Aware Workload Placement and Migration in Distributed Sustainable Datacenters, 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp.307-316, 2014.
DOI : 10.1109/IPDPS.2014.41

URL : http://www.cs.uccs.edu/~xzhou/publications/IPDPS2014.pdf

V. K. Chippa, D. Mohapatra, and A. Raghunathan, Scalable effort hardware design, Proceedings of the 47th Design Automation Conference on, DAC '10, pp.555-560, 2010.
DOI : 10.1145/1837274.1837411

K. Choi, R. Soma, and M. Pedram, Fine-grained dynamic voltage and frequency scaling for precise energy and performance tradeoff based on the ratio of off-chip access to on-chip computation times, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol.24, issue.1, pp.18-28, 2005.

E. F. Codd, Relational database: a practical foundation for productivity, Communications of the ACM, vol.25, issue.2, pp.109-117, 1982.
DOI : 10.1145/358396.358400

URL : http://dl.acm.org/ft_gateway.cfm?id=358400&type=pdf

J. Dean and S. Ghemawat, MapReduce, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.
DOI : 10.14293/S2199-1006.1.SOR-UNCAT.AUNHT8.v1.RBZFIB

G. Decandia, D. Hastorun, and M. Jampani, Dynamo, ACM SIGOPS Operating Systems Review, vol.41, issue.6, pp.205-220, 2007.
DOI : 10.1145/1323293.1294281

W. Deng, F. Liu, and H. Jin, Lifetime or energy: Consolidating servers with reliability control in virtualized cloud datacenters, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, pp.18-25, 2012.
DOI : 10.1109/CloudCom.2012.6427550

M. D. Dikaiakos, D. Katsaros, and P. Mehra, Cloud Computing: Distributed Internet Computing for IT and Scientific Research, IEEE Internet Computing, vol.13, issue.5, pp.10-13, 2009.
DOI : 10.1109/MIC.2009.103

F. Dinu and T. E. Ng, Understanding the effects and implications of compute node related failures in hadoop, Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, HPDC '12, pp.187-198, 2012.
DOI : 10.1145/2287076.2287108

D. Florescu and D. Kossmann, Rethinking cost and performance of database systems, ACM SIGMOD Record, vol.38, issue.1, pp.43-48, 2009.
DOI : 10.1145/1558334.1558339

A. Gainaru, F. Cappello, and W. Kramer, Taming of the Shrew: Modeling the Normal and Faulty Behaviour of Large-scale HPC Systems, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.1168-1179, 2012.
DOI : 10.1109/IPDPS.2012.107

M. Gamell, I. Rodero, and M. Parashar, Exploring power behaviors and trade-offs of in-situ data analytics, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, pp.1-12, 2013.
DOI : 10.1145/2503210.2503303

P. Garraghan, X. Ouyang, and P. Townend, Timely Long Tail Identification through Agent Based Monitoring and Analytics, 2015 IEEE 18th International Symposium on Real-Time Distributed Computing, pp.19-26, 2015.
DOI : 10.1109/ISORC.2015.39

URL : http://eprints.whiterose.ac.uk/88442/1/ISORC%20Camera%20Copy%20-%20Longtail%20identification.pdf

L. Gillam and M. Zakarya, Energy efficient computing, clusters, grids and clouds: a taxonomy and survey, Sustainable Computing: Informatics and Systems, pp.13-33, 2017.

I. Gog, M. Schwarzkopf, and A. Gleave, Firmament: fast, centralized cluster scheduling at scale, USENIX Symposium on Operating Systems Design and Implementation (OSDI '16), pp.99-115, 2016.

Í. Goiri, K. Le, and M. E. Haque, GreenSlot, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-11, 2011.
DOI : 10.1145/2063384.2063411

I. Goiri, K. Le, and T. D. Nguyen, GreenHadoop, Proceedings of the 7th ACM european conference on Computer Systems, EuroSys '12, pp.57-70, 2012.
DOI : 10.1145/2168836.2168843

T. Gunarathne, T. Wu, and J. Qiu, MapReduce in the Clouds for Science, 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp.565-572, 2010.
DOI : 10.1109/CloudCom.2010.107

J. Hamilton, Cost of power in large-scale data centers Available: http://perspectives.mvdirona.com, 2008.

S. Hammoud, M. Li, and Y. Liu, MRSim: A discrete event based MapReduce simulator, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery, pp.2993-2997, 2010.
DOI : 10.1109/FSKD.2010.5569086

. Hdfs-architecture-guide, Available: https

A. Holmes, Hadoop in practice, 2012.

D. Huang, X. Shi, and S. Ibrahim, MR-scope, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pp.849-855, 2010.
DOI : 10.1145/1851476.1851598

S. Ibrahim, B. He, and H. Jin, Towards Pay-As-You-Consume Cloud Computing, 2011 IEEE International Conference on Services Computing, pp.370-377, 2011.
DOI : 10.1109/SCC.2011.38

S. Ibrahim, H. Jin, and L. Lu, Handling partitioning skew in MapReduce using LEEN " , Peer-to-Peer Networking and Applications, pp.409-424, 2013.
DOI : 10.1007/s12083-013-0213-7

S. Ibrahim, H. Jin, and L. Lu, LEEN: Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud, 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp.17-24, 2010.
DOI : 10.1109/CloudCom.2010.25

S. Ibrahim, H. Jin, and L. Lu, Maestro: Replica-Aware Map Scheduling for MapReduce, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp.435-442, 2012.
DOI : 10.1109/CCGrid.2012.122

URL : https://hal.archives-ouvertes.fr/hal-00670813

S. Ibrahim, D. Moise, and H. Chihoub, Towards Efficient Power Management in MapReduce: Investigation of CPU-Frequencies Scaling on Power Efficiency in Hadoop, Workshop on Adaptive Resource Management and Scheduling for Cloud Computing (ARMS-CC '14), pp.147-164, 2014.
DOI : 10.1007/978-3-319-13464-2_11

URL : https://hal.archives-ouvertes.fr/hal-01077285

S. Ibrahim, T. Phan, and A. Carpen-amarie, Governing energy consumption in Hadoop through CPU frequency scaling: An analysis, Future Generation Computer Systems, vol.54, issue.C, 2016.
DOI : 10.1016/j.future.2015.01.005

URL : https://hal.archives-ouvertes.fr/hal-01166252

M. Isard, M. Budiu, and Y. Yu, Dryad: distributed data-parallel programs from sequential building blocks, ACM European Conference on Computer Systems (EuroSys '07), pp.59-72, 2007.

D. Jeffrey, Large-scale distributed systems at Google: current systems and future directions, ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware (LADIS '09), 2009.

Y. Jégou, S. Lantéri, and J. Leduc, Grid'5000: a large scale and highly reconfigurable experimental grid testbed, International Journal of High Performance Computing Applications (IJHPCA), vol.20, issue.4, pp.481-494, 2006.

H. Jin, S. Ibrahim, and T. Bell, Cloud Types and Services, Handbook of Cloud Computing, pp.335-355, 2010.
DOI : 10.1007/978-1-4419-6524-0_14

H. Jin, S. Ibrahim, and T. Bell, Tools and Technologies for Building Clouds, Cloud Computing, pp.3-20, 2010.
DOI : 10.1007/978-1-84996-241-4_1

H. Jin, S. Ibrahim, and L. Qi, The MapReduce Programming Model and Implementations, Cloud Computing: Principles and Paradigms, pp.373-390, 2011.
DOI : 10.1145/1131322.1131328

R. T. Kaushik and M. Bhandarkar, GreenHDFS: towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster, USENIX International Conference on Power Aware Computing and Systems (HotPower '10), pp.1-9, 2010.

J. Kim, J. Chou, and D. Rotem, Energy Proportionality and Performance in Data Parallel Computing Clusters, International Conference on Scientific and Statistical Database Management (SSDBM '11), pp.414-431, 2011.
DOI : 10.1007/BF01874392

URL : https://digital.library.unt.edu/ark:/67531/metadc840810/m2/1/high_res_d/1012478.pdf

W. Kolberg, P. D. Marcos, and J. C. Anjos, MRSG ??? A MapReduce simulator over SimGrid, Parallel Computing, vol.39, issue.4-5, pp.233-244, 2013.
DOI : 10.1016/j.parco.2013.02.001

URL : https://hal.archives-ouvertes.fr/hal-00931855

D. Kondo, A. A. Chien, and H. Casanova, Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids, Proceedings of the ACM/IEEE SC2004 Conference, pp.17-30, 2004.
DOI : 10.1109/SC.2004.50

URL : http://www-csag.ucsd.edu/papers/sc04_kondo.pdf

R. Koo and S. Toueg, Checkpointing and rollback-recovery for distributed systems, IEEE Transactions on Software Engineering, issue.1, pp.23-31, 1987.
DOI : 10.1109/tse.1987.232562

URL : http://ecommons.cornell.edu/bitstream/1813/6546/1/85-706.pdf

M. Kryczka, R. Cuevas, and C. Guerrero, A first step towards user assisted online social networks, Proceedings of the 3rd Workshop on Social Network Systems, SNS '10, pp.1-6, 2010.
DOI : 10.1145/1852658.1852664

URL : http://www.cl.cam.ac.uk/%7Eey204/pubs/2010_SNS.pdf

D. Laney, 3D data management: controlling data Volume, Velocity and Variety Available: https://blogs.gartner.com/doug-laney/files, 2001.

W. Lang and J. M. Patel, Energy management for MapReduce clusters, Proceedings of the VLDB Endowment (VLDB), pp.129-139, 2010.
DOI : 10.14778/1920841.1920862

URL : http://www.comp.nus.edu.sg/%7Evldb2010/proceedings/files/papers/R11.pdf

P. Langley and H. A. Simon, Applications of machine learning and rule induction, Communications of the ACM, vol.38, issue.11, pp.54-64, 1995.
DOI : 10.1145/219717.219768

G. Lee, B. Chun, and H. Katz, Heterogeneity-aware resource allocation and scheduling in the cloud, USENIX Conference on Hot Topics in Cloud Computing (HotCloud '11), pp.1-5, 2011.

L. Lei, T. Wo, and C. Hu, CREST: Towards Fast Speculation of Straggler Tasks in MapReduce, 2011 IEEE 8th International Conference on e-Business Engineering, pp.311-316, 2011.
DOI : 10.1109/ICEBE.2011.37

J. Leverich and C. Kozyrakis, On the energy (in)efficiency of Hadoop clusters, ACM SIGOPS Operating Systems Review, vol.44, issue.1, pp.61-65, 2010.
DOI : 10.1145/1740390.1740405

Y. Li, Q. Yang, and S. Lai, A New Speculative Execution Algorithm Based on C4.5 Decision Tree for Hadoop, International Conference of Young Computer Scientists, Engineers and Educators (ICYCSEE '15, pp.2015-284
DOI : 10.1007/978-3-662-46248-5_35

L. Lu, H. Jin, and X. Shi, Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce, 2012 ACM/IEEE 13th International Conference on Grid Computing, pp.76-84, 2012.
DOI : 10.1109/Grid.2012.31

URL : https://hal.archives-ouvertes.fr/hal-00757070

J. Luo, J. Jin, and A. Song, Cloud computing: architecture and key technologies, Journal of China Institute of Communications, vol.32, issue.7, pp.3-21, 2011.

A. Malik, A. Malik, and K. Hiekkanen, Impact of privacy, trust and user activity on intentions to share Facebook photos, Journal of Information, Communication and Ethics in Society, vol.11, issue.2, pp.364-382, 2016.
DOI : 10.1016/j.chb.2014.12.012

. Mapreduce-tutorial, Available: https

L. Mashayekhy, M. M. Nejad, and D. Grosu, Energy-Aware Scheduling of MapReduce Jobs, 2014 IEEE International Congress on Big Data, pp.32-39, 2014.
DOI : 10.1109/BigData.Congress.2014.15

L. Mashayekhy, M. M. Nejad, and D. Grosu, Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications, IEEE Transactions on Parallel and Distributed Systems, vol.26, issue.10, pp.2720-2733, 2015.
DOI : 10.1109/TPDS.2014.2358556

O. A. Mukhanov, Energy-Efficient Single Flux Quantum Technology, IEEE Transactions on Applied Superconductivity, vol.21, issue.3, pp.760-769, 2011.
DOI : 10.1109/TASC.2010.2096792

R. Nathuji, C. Isci, and E. Gorbatov, Exploiting Platform Heterogeneity for Power Efficient Data Centers, Fourth International Conference on Autonomic Computing (ICAC'07), pp.1-10, 2007.
DOI : 10.1109/ICAC.2007.16

M. Odersky, P. Altherr, and V. Cremet, An overview of the Scala programming language Available: https, École Polytechnique Fédérale de Lausanne, Tech. Rep, 2004.

A. Orgerie, M. D. Assuncao, and L. Lefevre, A survey on techniques for improving the energy efficiency of large-scale distributed systems, ACM Computing Surveys, vol.46, issue.4, pp.1-31, 2014.
DOI : 10.1109/SURV.2011.062410.00034

URL : https://hal.archives-ouvertes.fr/hal-00767582

X. Ouyang, P. Garraghan, and D. Mckee, Straggler Detection in Parallel Computing Systems through Dynamic Threshold Calculation, 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA), pp.414-421, 2016.
DOI : 10.1109/AINA.2016.84

URL : http://eprints.whiterose.ac.uk/100522/8/PID4055511_final.pdf

R. Patgiri and A. Ahmed, Big Data: The V's of the Game Changer Paradigm, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp.17-24, 2016.
DOI : 10.1109/HPCC-SmartCity-DSS.2016.0014

P. Pawluk, B. Simmons, and M. Smit, Introducing STRATOS: A Cloud Broker Service, 2012 IEEE Fifth International Conference on Cloud Computing, pp.891-898, 2012.
DOI : 10.1109/CLOUD.2012.24

URL : http://www.mikesmit.com/wp-content/papercite-data/pdf/cloud2012.pdf

T. Phan, S. Ibrahim, and G. Antoniu, On Understanding the Energy Impact of Speculative Execution in Hadoop, 2015 IEEE International Conference on Data Science and Data Intensive Systems, pp.2015-396
DOI : 10.1109/DSDIS.2015.45

URL : https://hal.archives-ouvertes.fr/hal-01238055

T. Phan, S. Ibrahim, and A. Zhou, Energy-Driven Straggler Mitigation in MapReduce, International European Conference on Parallel and Distributed Computing (Euro-Par '17, pp.2017-385
DOI : 10.1109/TCC.2015.2404807

URL : https://hal.archives-ouvertes.fr/hal-01560044

H. Powered, Available: http://wiki.apache.org/hadoop/PoweredBy

A. Qureshi, Power-demand routing in massive geo-distributed systems, 2010.

K. Ren, Y. Kwon, and M. Balazinska, Hadoop's adolescence, Proceedings of the VLDB Endowment (VLDB), pp.853-864, 2013.
DOI : 10.14778/2536206.2536213

S. I. Resnick, Heavy-tail phenomena: Probabilistic and statistical modeling, 2007.

P. Roy, R. Ray, and C. Wang, ASAC: automatic sensitivity analysis for approximate computing, SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES '14), pp.95-104, 2014.

P. Russom, Big Data analytics Available: https://vivomente, 2011.

J. Schad, J. Dittrich, and J. Quiané-ruiz, Runtime measurements in the cloud, Proceedings of the VLDB Endowment (VLDB), pp.460-471, 2010.
DOI : 10.14778/1920841.1920902

M. C. Schatz, CloudBurst: highly sensitive read mapping with MapReduce, Bioinformatics, vol.452, issue.7189, pp.1363-1369, 2009.
DOI : 10.1038/nature06884

URL : https://academic.oup.com/bioinformatics/article-pdf/25/11/1363/950981/btp236.pdf

L. Shen, T. Abdelzaher, and M. Yuan, TAPA: Temperature aware power allocation in data center with Map-Reduce, 2011 International Green Computing Conference and Workshops, pp.1-8, 2011.
DOI : 10.1109/IGCC.2011.6008602

M. Silberstein, A. Sharov, and D. Geiger, GridBot, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-1112, 2009.
DOI : 10.1145/1654059.1654071

Z. D. Stephens, S. Y. Lee, and F. Faghri, Big Data: Astronomical or Genomical?, PLOS Biology, vol.28, issue.7, pp.1-11, 2015.
DOI : 10.1371/journal.pbio.1002195.s006

URL : https://doi.org/10.1371/journal.pbio.1002195

F. Teng, L. Yu, and F. Magoulès, SimMapReduce: A Simulator for Modeling MapReduce Framework, 2011 Fifth FTRA International Conference on Multimedia and Ubiquitous Engineering, pp.277-282, 2011.
DOI : 10.1109/MUE.2011.56

URL : https://hal.archives-ouvertes.fr/hal-00803363

E. Thereska, A. Donnelly, and D. Narayanan, Sierra, Proceedings of the sixth conference on Computer systems, EuroSys '11, pp.169-182, 2011.
DOI : 10.1145/1966445.1966461

P. Thinakaran, J. R. Gunasekaran, and B. Sharma, Phoenix: A Constraint-Aware Scheduler for Heterogeneous Datacenters, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp.2017-977
DOI : 10.1109/ICDCS.2017.262

A. Thusoo, Z. Shao, and S. Anthony, Data warehousing and analytics infrastructure at facebook, Proceedings of the 2010 international conference on Management of data, SIGMOD '10, pp.1013-1020, 2010.
DOI : 10.1145/1807167.1807278

A. Toshniwal, S. Taneja, and A. Shukla, Storm@twitter, Proceedings of the 2014 ACM SIGMOD international conference on Management of data, SIGMOD '14, pp.147-156, 2014.
DOI : 10.1145/2588555.2595641

N. Vasi´cvasi´c, M. Barisits, and V. Salzgeber, Making cluster applications energy-aware, ACM Workshop on Automated Control for Datacenters and Clouds, pp.37-42, 2009.

V. K. Vavilapalli, A. C. Murthy, and C. Douglas, Apache Hadoop YARN, Proceedings of the 4th annual Symposium on Cloud Computing, SOCC '13, pp.1-16, 2013.
DOI : 10.1145/2523616.2523633

S. Venkataramani, A. Raghunathan, and J. Liu, Scalable-effort classifiers for energy-efficient machine learning, Proceedings of the 52nd Annual Design Automation Conference on, DAC '15, pp.1-6, 2015.
DOI : 10.1145/1656274.1656278

R. L. Villars, C. W. Olofson, and M. Eastwood, Big Data: what it is and why you should care?, 2011.

G. Wang and T. S. Ng, The Impact of Virtualization on Network Performance of Amazon EC2 Data Center, 2010 Proceedings IEEE INFOCOM, pp.1-9, 2010.
DOI : 10.1109/INFCOM.2010.5461931

G. Wang, A. R. Butt, and P. Pandey, A simulation approach to evaluating design decisions in MapReduce setups, IEEE International Symposium on Modeling, pp.1-11, 2009.

T. White, Hadoop: The definitive guide. O'Reilly Media, 2012.

K. Wiley, A. Connolly, and J. P. Gardner, Astronomy in the cloud: using MapReduce for image coaddition, Annual Conference on Astronomical Data Analysis Software and Systems (ADASS '11), pp.93-96, 2011.

T. Wirtz and R. Ge, Improving MapReduce energy efficiency for computation intensive workloads, 2011 International Green Computing Conference and Workshops, pp.1-8, 2011.
DOI : 10.1109/IGCC.2011.6008564

H. Wu, K. Li, and Z. Tang, A Heuristic Speculative Execution Strategy in Heterogeneous Distributed Environments, 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming, pp.268-273, 2014.
DOI : 10.1109/PAAP.2014.29

X. Wu, X. Zhu, and G. Q. Wu, Data mining with Big Data, IEEE Transactions on Knowledge and Data EngineeringTKDE), vol.26, issue.1, pp.97-107, 2014.

H. Xu and W. C. Lau, Optimization for Speculative Execution in Big Data Processing Clusters, IEEE Transactions on Parallel and Distributed Systems, vol.28, issue.2, pp.530-545, 2017.
DOI : 10.1109/TPDS.2016.2564962

H. Xu and W. C. Lau, Optimization for speculative execution of multiple jobs in a MapReduce-like cluster Available: https, 2014.

H. Xu and W. C. Lau, Task-Cloning Algorithms in a MapReduce Cluster with Competitive Performance Bounds, 2015 IEEE 35th International Conference on Distributed Computing Systems, pp.339-348, 2015.
DOI : 10.1109/ICDCS.2015.42

H. Xu and W. C. Lau, Resource optimization for speculative execution in a MapReduce cluster, IEEE International Conference on Network Protocols (ICNP '13), pp.1-3, 2013.

H. Xu and W. C. Lau, Speculative Execution for a Single Job in a MapReduce-Like System, 2014 IEEE 7th International Conference on Cloud Computing, pp.586-593, 2014.
DOI : 10.1109/CLOUD.2014.84

Y. Xu and S. Mao, A survey of mobile cloud computing for rich media applications, IEEE Wireless Communications, vol.20, issue.3, pp.46-53, 2013.
DOI : 10.1109/MWC.2013.6549282

O. Yildiz, S. Ibrahim, and T. A. Phuong, Chronos: Failure-aware scheduling in shared Hadoop clusters, 2015 IEEE International Conference on Big Data (Big Data), pp.313-318, 2015.
DOI : 10.1109/BigData.2015.7363770

URL : https://hal.archives-ouvertes.fr/hal-01203001

O. Yildiz, M. Dorier, and S. Ibrahim, On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.750-759, 2016.
DOI : 10.1109/IPDPS.2016.50

URL : https://hal.archives-ouvertes.fr/hal-01270630

O. Yildiz, S. Ibrahim, and G. Antoniu, Enabling fast failure recovery in shared Hadoop clusters: Towards failure-aware scheduling, Future Generation Computer Systems, vol.74, pp.208-219, 2017.
DOI : 10.1016/j.future.2016.02.015

URL : https://hal.archives-ouvertes.fr/hal-01338336

M. Zaharia, D. Borthakur, J. Sen, and . Sarma, Delay scheduling, Proceedings of the 5th European conference on Computer systems, EuroSys '10, pp.265-278, 2010.
DOI : 10.1145/1755913.1755940

M. Zaharia, M. Chowdhury, and T. Das, Resilient Distributed Datasets, USENIX Conference on Networked Systems Design and Implementation (NSDI '12), pp.15-28, 2012.
DOI : 10.1145/2886107.2886110

M. Zaharia, M. Chowdhury, and M. J. Franklin, Spark: cluster computing with working sets, USENIX Conference on Hot Topics in Cloud Computing (HotCloud '10), pp.1-7, 2010.

M. Zaharia, A. Konwinski, and A. D. Joseph, Improving MapReduce performance in heterogeneous environments, USENIX Symposium on Operating Systems Design and Implementation (OSDI '08), USENIX Association, pp.29-42, 2008.

A. C. Zhou, B. He, and S. Ibrahim, A taxonomy and survey of scientific computing in the cloud " , in Big Data: Principles and Paradigms, pp.431-455, 2016.

M. Zwolenski and L. Weatherill, The Digital Universe Rich Data and the Increasing Value of the Internet of Things, Australian Journal of Telecommunications and the Digital Economy, vol.2, issue.3, pp.1-9, 2014.
DOI : 10.7790/ajtde.v2n3.47

À. Une-telle-Échelle, Au niveau de l'application, le matériel est réparti physiquement entre différents utilisateurs De ce fait, les ressources allouées à une application ne garantissent pas de fournir des performances constantes pendant la durée de vie de cette application Cette hétérogénéité, à son tour, aboutit à une évidente variabilité de performance [131]. D'autre part, les infrastructures à grande échelle se composent de milliers de machines qui consomment collectivement une énorme quantité d'énergie, ce qui entraîne un énorme coût opérationnel [46]. Par exemple, la consommation annuelle d'électricité des datacenters de Google dépasse 1.120 GWh, ce qui correspond à une facture d'électricité de 67 M $ [92]. À l'avenir, la variabilité de performance et la consommation d'énergie continueront d'être des préoccupations majeures pour la conception et l'exploitation des systèmes de traitement Big Data [74, 97]. L'échelle des infrastructures sous-jacentes doit augmenter pour faire face à l'augmentation implacable de la taille des données. Cette échelle croissante augmentera non seulement la variabilité de performance mais aussi la consommation d'énergie . À titre indicatif, les besoins en énergie pour le fonctionnement des systèmes de traitement Big Data devraient atteindre l'equivalent de la production d'une centrale nucléaire moyenne [82]. Dans le contexte du Big Data, un calcul se compose généralement d'un très grand nombre de tâches élémentaires. La performance d'un calcul est déterminée par la fin de sa dernière tâche. En raison de la variabilité élevée de performance, les temps d'exécution des tâches peuvent varier de manière importante au sein du même calcul. Même si les temps d'exécution d'un grand nombre de tâches restent proches du temps d'exécution moyen, certains d'entre eux peuvent présenter une très grande déviation. Il n'est pas rare dans la pratique d'observer certaines tâches avec des temps d'exécution jusqu'à huit fois plus longs que le temps d'exécution moyen, hétérogénéité des ressources est inévitable. Elle apparaît à différents niveaux des systèmes. Au niveau du matériel, plusieurs générations de matériel coexistent dans les infrastructures des clouds. Par conséquent, les utilisateurs n'ont aucun contrôle sur les matériels qui leur sont attribués Ce phénomène est appelé distribution heavy-tail [94]. Il a un impact négatif sur la performance du calcul [131]. Dans le domaine du Big Data, ces tâches nuisibles sont appelées stragglers. Il existe un grand nombre de travaux consacrés à la réduction de la fréquence d'apparition de stragglers Cependant, la variabilité de performance entraine l'apparition de stragglers inattendus. Dans la pratique, il a été démontré que ces stragglers ont un impact majeur sur la performance [131]. En conséquence, la prévention des stragglers est un objectif crucial pour améliorer les performances des grands systèmes de traitement Big Data, p.41