PUMA: Purdue MapReduce benchmarks suite, Purdue University, Tech. Rep, 2012. ,
Robust and flexible power-proportional storage, Proceedings of the 1st ACM symposium on Cloud computing, SoCC '10, pp.217-228, 2010. ,
DOI : 10.1145/1807128.1807164
URL : http://www.pdl.cs.cmu.edu/PDL-FTP/Storage/rabbit.pdf
Effective straggler mitigation: attack of the clones, USENIX Symposium on Networked Systems Design and Implementation (NSDI '13), pp.185-198, 2013. ,
GRASS: trimming stragglers in approximation analytics, USENIX Symposium on Networked Systems Design and Implementation (NSDI '14), pp.289-302, 2014. ,
Reining in the outliers in MapReduce clusters using Mantri, USENIX Symposium on Operating Systems Design and Implementation (OSDI '10), pp.1-16, 2010. ,
SETI@home: an experiment in public-resource computing, Communications of the ACM, vol.45, issue.11, pp.56-61, 2002. ,
DOI : 10.1145/581571.581573
An introduction to multivariate statistical analysis, 1958. ,
Fault-aware scheduling for Bag-of-Tasks applications on Desktop Grids, 2006 7th IEEE/ACM International Conference on Grid Computing, pp.56-63, 2006. ,
DOI : 10.1109/ICGRID.2006.310998
Scheduling algorithms for multiple Bag-of-Task applications on Desktop Grids: A knowledge-free approach, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.1-8, 2008. ,
DOI : 10.1109/IPDPS.2008.4536445
A view of cloud computing, Communications of the ACM, vol.53, issue.4, pp.50-58, 2010. ,
DOI : 10.1145/1721654.1721672
Cascading Divide-and-Conquer: A Technique for Designing Parallel Algorithms, SIAM Journal on Computing, vol.18, issue.3, pp.499-532, 1989. ,
DOI : 10.1137/0218035
Checkpointing algorithms and fault prediction, Journal of Parallel and Distributed Computing, vol.74, issue.2, pp.2048-2064, 2014. ,
DOI : 10.1016/j.jpdc.2013.10.010
URL : https://hal.archives-ouvertes.fr/hal-00788313
Technology, e-learning and distance education. Routledge, 2005. ,
DOI : 10.4324/9780203463772
Fog computing and its role in the internet of things, Proceedings of the first edition of the MCC workshop on Mobile cloud computing, MCC '12, pp.13-16, 2012. ,
DOI : 10.1145/2342509.2342513
Extensible markup language (XML), World Wide Web Journal (WWW), vol.2, issue.4, pp.27-66, 1997. ,
Toward energy-efficient computing, Communications of the ACM, vol.53, issue.3, pp.50-58, 2010. ,
DOI : 10.1145/1666420.1666438
A query language and optimization techniques for unstructured data, ACM SIGMOD International Conference on Management of Data (SIGMOD '96), pp.505-516, 1996. ,
DOI : 10.1145/235968.233368
URL : http://wwwipd.ira.uka.de/~ggh/papers/BDHS96.ps.gz
Apache Flink: stream and batch processing in a single engine, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol.36, issue.4, pp.28-38, 2015. ,
Exploiting spatio-temporal tradeoffs for energy-aware MapReduce in the cloud, IEEE International Conference on Cloud Computing (CLOUD '11), pp.251-258, 2011. ,
Business intelligence and analytics: from big data to big impact, Management Information Systems Quarterly (MISQ), vol.36, issue.4, pp.1165-1188, 2012. ,
Improving MapReduce Performance Using Smart Speculative Execution Strategy, IEEE Transactions on Computers, vol.63, issue.4, pp.29-42, 2014. ,
DOI : 10.1109/TC.2013.15
Energy efficiency for large-scale MapReduce workloads with significant interactive analysis, Proceedings of the 7th ACM european conference on Computer Systems, EuroSys '12, pp.43-56, 2012. ,
DOI : 10.1145/2168836.2168842
URL : http://www.eecs.berkeley.edu/~ychen2/professional/eurosys2012paper125-cameraReady.pdf
Interactive analytical processing in big data systems, Proceedings of the VLDB Endowment (VLDB), pp.1802-1813, 2012. ,
DOI : 10.14778/2367502.2367519
To compress or not to compress - compute vs. IO tradeoffs for mapreduce energy efficiency, Proceedings of the first ACM SIGCOMM workshop on Green networking, Green Networking '10, pp.23-28, 2010. ,
DOI : 10.1145/1851290.1851296
Towards energy efficient MapReduce, EECS Department, 2009. ,
Heterogeneity-Aware Workload Placement and Migration in Distributed Sustainable Datacenters, 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp.307-316, 2014. ,
DOI : 10.1109/IPDPS.2014.41
URL : http://www.cs.uccs.edu/~xzhou/publications/IPDPS2014.pdf
Scalable effort hardware design, Proceedings of the 47th Design Automation Conference on, DAC '10, pp.555-560, 2010. ,
DOI : 10.1145/1837274.1837411
Fine-grained dynamic voltage and frequency scaling for precise energy and performance tradeoff based on the ratio of off-chip access to on-chip computation times, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol.24, issue.1, pp.18-28, 2005. ,
Relational database: a practical foundation for productivity, Communications of the ACM, vol.25, issue.2, pp.109-117, 1982. ,
DOI : 10.1145/358396.358400
URL : http://dl.acm.org/ft_gateway.cfm?id=358400&type=pdf
MapReduce, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008. ,
DOI : 10.14293/S2199-1006.1.SOR-UNCAT.AUNHT8.v1.RBZFIB
Dynamo, ACM SIGOPS Operating Systems Review, vol.41, issue.6, pp.205-220, 2007. ,
DOI : 10.1145/1323293.1294281
Lifetime or energy: Consolidating servers with reliability control in virtualized cloud datacenters, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, pp.18-25, 2012. ,
DOI : 10.1109/CloudCom.2012.6427550
Cloud Computing: Distributed Internet Computing for IT and Scientific Research, IEEE Internet Computing, vol.13, issue.5, pp.10-13, 2009. ,
DOI : 10.1109/MIC.2009.103
Understanding the effects and implications of compute node related failures in hadoop, Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, HPDC '12, pp.187-198, 2012. ,
DOI : 10.1145/2287076.2287108
Rethinking cost and performance of database systems, ACM SIGMOD Record, vol.38, issue.1, pp.43-48, 2009. ,
DOI : 10.1145/1558334.1558339
Taming of the Shrew: Modeling the Normal and Faulty Behaviour of Large-scale HPC Systems, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.1168-1179, 2012. ,
DOI : 10.1109/IPDPS.2012.107
Exploring power behaviors and trade-offs of in-situ data analytics, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, pp.1-12, 2013. ,
DOI : 10.1145/2503210.2503303
Timely Long Tail Identification through Agent Based Monitoring and Analytics, 2015 IEEE 18th International Symposium on Real-Time Distributed Computing, pp.19-26, 2015. ,
DOI : 10.1109/ISORC.2015.39
URL : http://eprints.whiterose.ac.uk/88442/1/ISORC%20Camera%20Copy%20-%20Longtail%20identification.pdf
Energy efficient computing, clusters, grids and clouds: a taxonomy and survey, Sustainable Computing: Informatics and Systems, pp.13-33, 2017. ,
Firmament: fast, centralized cluster scheduling at scale, USENIX Symposium on Operating Systems Design and Implementation (OSDI '16), pp.99-115, 2016. ,
GreenSlot, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-11, 2011. ,
DOI : 10.1145/2063384.2063411
GreenHadoop, Proceedings of the 7th ACM european conference on Computer Systems, EuroSys '12, pp.57-70, 2012. ,
DOI : 10.1145/2168836.2168843
MapReduce in the Clouds for Science, 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp.565-572, 2010. ,
DOI : 10.1109/CloudCom.2010.107
Cost of power in large-scale data centers Available: http://perspectives.mvdirona.com, 2008. ,
MRSim: A discrete event based MapReduce simulator, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery, pp.2993-2997, 2010. ,
DOI : 10.1109/FSKD.2010.5569086
Available: https ,
Hadoop in practice, 2012. ,
MR-scope, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pp.849-855, 2010. ,
DOI : 10.1145/1851476.1851598
Towards Pay-As-You-Consume Cloud Computing, 2011 IEEE International Conference on Services Computing, pp.370-377, 2011. ,
DOI : 10.1109/SCC.2011.38
Handling partitioning skew in MapReduce using LEEN " , Peer-to-Peer Networking and Applications, pp.409-424, 2013. ,
DOI : 10.1007/s12083-013-0213-7
LEEN: Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud, 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp.17-24, 2010. ,
DOI : 10.1109/CloudCom.2010.25
Maestro: Replica-Aware Map Scheduling for MapReduce, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp.435-442, 2012. ,
DOI : 10.1109/CCGrid.2012.122
URL : https://hal.archives-ouvertes.fr/hal-00670813
Towards Efficient Power Management in MapReduce: Investigation of CPU-Frequencies Scaling on Power Efficiency in Hadoop, Workshop on Adaptive Resource Management and Scheduling for Cloud Computing (ARMS-CC '14), pp.147-164, 2014. ,
DOI : 10.1007/978-3-319-13464-2_11
URL : https://hal.archives-ouvertes.fr/hal-01077285
Governing energy consumption in Hadoop through CPU frequency scaling: An analysis, Future Generation Computer Systems, vol.54, issue.C, 2016. ,
DOI : 10.1016/j.future.2015.01.005
URL : https://hal.archives-ouvertes.fr/hal-01166252
Dryad: distributed data-parallel programs from sequential building blocks, ACM European Conference on Computer Systems (EuroSys '07), pp.59-72, 2007. ,
Large-scale distributed systems at Google: current systems and future directions, ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware (LADIS '09), 2009. ,
Grid'5000: a large scale and highly reconfigurable experimental grid testbed, International Journal of High Performance Computing Applications (IJHPCA), vol.20, issue.4, pp.481-494, 2006. ,
Cloud Types and Services, Handbook of Cloud Computing, pp.335-355, 2010. ,
DOI : 10.1007/978-1-4419-6524-0_14
Tools and Technologies for Building Clouds, Cloud Computing, pp.3-20, 2010. ,
DOI : 10.1007/978-1-84996-241-4_1
The MapReduce Programming Model and Implementations, Cloud Computing: Principles and Paradigms, pp.373-390, 2011. ,
DOI : 10.1145/1131322.1131328
GreenHDFS: towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster, USENIX International Conference on Power Aware Computing and Systems (HotPower '10), pp.1-9, 2010. ,
Energy Proportionality and Performance in Data Parallel Computing Clusters, International Conference on Scientific and Statistical Database Management (SSDBM '11), pp.414-431, 2011. ,
DOI : 10.1007/BF01874392
URL : https://digital.library.unt.edu/ark:/67531/metadc840810/m2/1/high_res_d/1012478.pdf
MRSG ??? A MapReduce simulator over SimGrid, Parallel Computing, vol.39, issue.4-5, pp.233-244, 2013. ,
DOI : 10.1016/j.parco.2013.02.001
URL : https://hal.archives-ouvertes.fr/hal-00931855
Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids, Proceedings of the ACM/IEEE SC2004 Conference, pp.17-30, 2004. ,
DOI : 10.1109/SC.2004.50
URL : http://www-csag.ucsd.edu/papers/sc04_kondo.pdf
Checkpointing and rollback-recovery for distributed systems, IEEE Transactions on Software Engineering, issue.1, pp.23-31, 1987. ,
DOI : 10.1109/tse.1987.232562
URL : http://ecommons.cornell.edu/bitstream/1813/6546/1/85-706.pdf
A first step towards user assisted online social networks, Proceedings of the 3rd Workshop on Social Network Systems, SNS '10, pp.1-6, 2010. ,
DOI : 10.1145/1852658.1852664
URL : http://www.cl.cam.ac.uk/%7Eey204/pubs/2010_SNS.pdf
3D data management: controlling data Volume, Velocity and Variety Available: https://blogs.gartner.com/doug-laney/files, 2001. ,
Energy management for MapReduce clusters, Proceedings of the VLDB Endowment (VLDB), pp.129-139, 2010. ,
DOI : 10.14778/1920841.1920862
URL : http://www.comp.nus.edu.sg/%7Evldb2010/proceedings/files/papers/R11.pdf
Applications of machine learning and rule induction, Communications of the ACM, vol.38, issue.11, pp.54-64, 1995. ,
DOI : 10.1145/219717.219768
Heterogeneity-aware resource allocation and scheduling in the cloud, USENIX Conference on Hot Topics in Cloud Computing (HotCloud '11), pp.1-5, 2011. ,
CREST: Towards Fast Speculation of Straggler Tasks in MapReduce, 2011 IEEE 8th International Conference on e-Business Engineering, pp.311-316, 2011. ,
DOI : 10.1109/ICEBE.2011.37
On the energy (in)efficiency of Hadoop clusters, ACM SIGOPS Operating Systems Review, vol.44, issue.1, pp.61-65, 2010. ,
DOI : 10.1145/1740390.1740405
A New Speculative Execution Algorithm Based on C4.5 Decision Tree for Hadoop, International Conference of Young Computer Scientists, Engineers and Educators (ICYCSEE '15, pp.2015-284 ,
DOI : 10.1007/978-3-662-46248-5_35
Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce, 2012 ACM/IEEE 13th International Conference on Grid Computing, pp.76-84, 2012. ,
DOI : 10.1109/Grid.2012.31
URL : https://hal.archives-ouvertes.fr/hal-00757070
Cloud computing: architecture and key technologies, Journal of China Institute of Communications, vol.32, issue.7, pp.3-21, 2011. ,
Impact of privacy, trust and user activity on intentions to share Facebook photos, Journal of Information, Communication and Ethics in Society, vol.11, issue.2, pp.364-382, 2016. ,
DOI : 10.1016/j.chb.2014.12.012
Available: https ,
Energy-Aware Scheduling of MapReduce Jobs, 2014 IEEE International Congress on Big Data, pp.32-39, 2014. ,
DOI : 10.1109/BigData.Congress.2014.15
Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications, IEEE Transactions on Parallel and Distributed Systems, vol.26, issue.10, pp.2720-2733, 2015. ,
DOI : 10.1109/TPDS.2014.2358556
Energy-Efficient Single Flux Quantum Technology, IEEE Transactions on Applied Superconductivity, vol.21, issue.3, pp.760-769, 2011. ,
DOI : 10.1109/TASC.2010.2096792
Exploiting Platform Heterogeneity for Power Efficient Data Centers, Fourth International Conference on Autonomic Computing (ICAC'07), pp.1-10, 2007. ,
DOI : 10.1109/ICAC.2007.16
An overview of the Scala programming language Available: https, École Polytechnique Fédérale de Lausanne, Tech. Rep, 2004. ,
A survey on techniques for improving the energy efficiency of large-scale distributed systems, ACM Computing Surveys, vol.46, issue.4, pp.1-31, 2014. ,
DOI : 10.1109/SURV.2011.062410.00034
URL : https://hal.archives-ouvertes.fr/hal-00767582
Straggler Detection in Parallel Computing Systems through Dynamic Threshold Calculation, 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA), pp.414-421, 2016. ,
DOI : 10.1109/AINA.2016.84
URL : http://eprints.whiterose.ac.uk/100522/8/PID4055511_final.pdf
Big Data: The V's of the Game Changer Paradigm, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp.17-24, 2016. ,
DOI : 10.1109/HPCC-SmartCity-DSS.2016.0014
Introducing STRATOS: A Cloud Broker Service, 2012 IEEE Fifth International Conference on Cloud Computing, pp.891-898, 2012. ,
DOI : 10.1109/CLOUD.2012.24
URL : http://www.mikesmit.com/wp-content/papercite-data/pdf/cloud2012.pdf
On Understanding the Energy Impact of Speculative Execution in Hadoop, 2015 IEEE International Conference on Data Science and Data Intensive Systems, pp.2015-396 ,
DOI : 10.1109/DSDIS.2015.45
URL : https://hal.archives-ouvertes.fr/hal-01238055
Energy-Driven Straggler Mitigation in MapReduce, International European Conference on Parallel and Distributed Computing (Euro-Par '17, pp.2017-385 ,
DOI : 10.1109/TCC.2015.2404807
URL : https://hal.archives-ouvertes.fr/hal-01560044
Available: http://wiki.apache.org/hadoop/PoweredBy ,
Power-demand routing in massive geo-distributed systems, 2010. ,
Hadoop's adolescence, Proceedings of the VLDB Endowment (VLDB), pp.853-864, 2013. ,
DOI : 10.14778/2536206.2536213
Heavy-tail phenomena: Probabilistic and statistical modeling, 2007. ,
ASAC: automatic sensitivity analysis for approximate computing, SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES '14), pp.95-104, 2014. ,
Big Data analytics Available: https://vivomente, 2011. ,
Runtime measurements in the cloud, Proceedings of the VLDB Endowment (VLDB), pp.460-471, 2010. ,
DOI : 10.14778/1920841.1920902
CloudBurst: highly sensitive read mapping with MapReduce, Bioinformatics, vol.452, issue.7189, pp.1363-1369, 2009. ,
DOI : 10.1038/nature06884
URL : https://academic.oup.com/bioinformatics/article-pdf/25/11/1363/950981/btp236.pdf
TAPA: Temperature aware power allocation in data center with Map-Reduce, 2011 International Green Computing Conference and Workshops, pp.1-8, 2011. ,
DOI : 10.1109/IGCC.2011.6008602
GridBot, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-1112, 2009. ,
DOI : 10.1145/1654059.1654071
Big Data: Astronomical or Genomical?, PLOS Biology, vol.28, issue.7, pp.1-11, 2015. ,
DOI : 10.1371/journal.pbio.1002195.s006
URL : https://doi.org/10.1371/journal.pbio.1002195
SimMapReduce: A Simulator for Modeling MapReduce Framework, 2011 Fifth FTRA International Conference on Multimedia and Ubiquitous Engineering, pp.277-282, 2011. ,
DOI : 10.1109/MUE.2011.56
URL : https://hal.archives-ouvertes.fr/hal-00803363
Sierra, Proceedings of the sixth conference on Computer systems, EuroSys '11, pp.169-182, 2011. ,
DOI : 10.1145/1966445.1966461
Phoenix: A Constraint-Aware Scheduler for Heterogeneous Datacenters, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp.2017-977 ,
DOI : 10.1109/ICDCS.2017.262
Data warehousing and analytics infrastructure at facebook, Proceedings of the 2010 international conference on Management of data, SIGMOD '10, pp.1013-1020, 2010. ,
DOI : 10.1145/1807167.1807278
Storm@twitter, Proceedings of the 2014 ACM SIGMOD international conference on Management of data, SIGMOD '14, pp.147-156, 2014. ,
DOI : 10.1145/2588555.2595641
Making cluster applications energy-aware, ACM Workshop on Automated Control for Datacenters and Clouds, pp.37-42, 2009. ,
Apache Hadoop YARN, Proceedings of the 4th annual Symposium on Cloud Computing, SOCC '13, pp.1-16, 2013. ,
DOI : 10.1145/2523616.2523633
Scalable-effort classifiers for energy-efficient machine learning, Proceedings of the 52nd Annual Design Automation Conference on, DAC '15, pp.1-6, 2015. ,
DOI : 10.1145/1656274.1656278
Big Data: what it is and why you should care?, 2011. ,
The Impact of Virtualization on Network Performance of Amazon EC2 Data Center, 2010 Proceedings IEEE INFOCOM, pp.1-9, 2010. ,
DOI : 10.1109/INFCOM.2010.5461931
A simulation approach to evaluating design decisions in MapReduce setups, IEEE International Symposium on Modeling, pp.1-11, 2009. ,
Hadoop: The definitive guide. O'Reilly Media, 2012. ,
Astronomy in the cloud: using MapReduce for image coaddition, Annual Conference on Astronomical Data Analysis Software and Systems (ADASS '11), pp.93-96, 2011. ,
Improving MapReduce energy efficiency for computation intensive workloads, 2011 International Green Computing Conference and Workshops, pp.1-8, 2011. ,
DOI : 10.1109/IGCC.2011.6008564
A Heuristic Speculative Execution Strategy in Heterogeneous Distributed Environments, 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming, pp.268-273, 2014. ,
DOI : 10.1109/PAAP.2014.29
Data mining with Big Data, IEEE Transactions on Knowledge and Data EngineeringTKDE), vol.26, issue.1, pp.97-107, 2014. ,
Optimization for Speculative Execution in Big Data Processing Clusters, IEEE Transactions on Parallel and Distributed Systems, vol.28, issue.2, pp.530-545, 2017. ,
DOI : 10.1109/TPDS.2016.2564962
Optimization for speculative execution of multiple jobs in a MapReduce-like cluster Available: https, 2014. ,
Task-Cloning Algorithms in a MapReduce Cluster with Competitive Performance Bounds, 2015 IEEE 35th International Conference on Distributed Computing Systems, pp.339-348, 2015. ,
DOI : 10.1109/ICDCS.2015.42
Resource optimization for speculative execution in a MapReduce cluster, IEEE International Conference on Network Protocols (ICNP '13), pp.1-3, 2013. ,
Speculative Execution for a Single Job in a MapReduce-Like System, 2014 IEEE 7th International Conference on Cloud Computing, pp.586-593, 2014. ,
DOI : 10.1109/CLOUD.2014.84
A survey of mobile cloud computing for rich media applications, IEEE Wireless Communications, vol.20, issue.3, pp.46-53, 2013. ,
DOI : 10.1109/MWC.2013.6549282
Chronos: Failure-aware scheduling in shared Hadoop clusters, 2015 IEEE International Conference on Big Data (Big Data), pp.313-318, 2015. ,
DOI : 10.1109/BigData.2015.7363770
URL : https://hal.archives-ouvertes.fr/hal-01203001
On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.750-759, 2016. ,
DOI : 10.1109/IPDPS.2016.50
URL : https://hal.archives-ouvertes.fr/hal-01270630
Enabling fast failure recovery in shared Hadoop clusters: Towards failure-aware scheduling, Future Generation Computer Systems, vol.74, pp.208-219, 2017. ,
DOI : 10.1016/j.future.2016.02.015
URL : https://hal.archives-ouvertes.fr/hal-01338336
Delay scheduling, Proceedings of the 5th European conference on Computer systems, EuroSys '10, pp.265-278, 2010. ,
DOI : 10.1145/1755913.1755940
Resilient Distributed Datasets, USENIX Conference on Networked Systems Design and Implementation (NSDI '12), pp.15-28, 2012. ,
DOI : 10.1145/2886107.2886110
Spark: cluster computing with working sets, USENIX Conference on Hot Topics in Cloud Computing (HotCloud '10), pp.1-7, 2010. ,
Improving MapReduce performance in heterogeneous environments, USENIX Symposium on Operating Systems Design and Implementation (OSDI '08), USENIX Association, pp.29-42, 2008. ,
A taxonomy and survey of scientific computing in the cloud " , in Big Data: Principles and Paradigms, pp.431-455, 2016. ,
The Digital Universe Rich Data and the Increasing Value of the Internet of Things, Australian Journal of Telecommunications and the Digital Economy, vol.2, issue.3, pp.1-9, 2014. ,
DOI : 10.7790/ajtde.v2n3.47
Au niveau de l'application, le matériel est réparti physiquement entre différents utilisateurs De ce fait, les ressources allouées à une application ne garantissent pas de fournir des performances constantes pendant la durée de vie de cette application Cette hétérogénéité, à son tour, aboutit à une évidente variabilité de performance [131]. D'autre part, les infrastructures à grande échelle se composent de milliers de machines qui consomment collectivement une énorme quantité d'énergie, ce qui entraîne un énorme coût opérationnel [46]. Par exemple, la consommation annuelle d'électricité des datacenters de Google dépasse 1.120 GWh, ce qui correspond à une facture d'électricité de 67 M $ [92]. À l'avenir, la variabilité de performance et la consommation d'énergie continueront d'être des préoccupations majeures pour la conception et l'exploitation des systèmes de traitement Big Data [74, 97]. L'échelle des infrastructures sous-jacentes doit augmenter pour faire face à l'augmentation implacable de la taille des données. Cette échelle croissante augmentera non seulement la variabilité de performance mais aussi la consommation d'énergie . À titre indicatif, les besoins en énergie pour le fonctionnement des systèmes de traitement Big Data devraient atteindre l'equivalent de la production d'une centrale nucléaire moyenne [82]. Dans le contexte du Big Data, un calcul se compose généralement d'un très grand nombre de tâches élémentaires. La performance d'un calcul est déterminée par la fin de sa dernière tâche. En raison de la variabilité élevée de performance, les temps d'exécution des tâches peuvent varier de manière importante au sein du même calcul. Même si les temps d'exécution d'un grand nombre de tâches restent proches du temps d'exécution moyen, certains d'entre eux peuvent présenter une très grande déviation. Il n'est pas rare dans la pratique d'observer certaines tâches avec des temps d'exécution jusqu'à huit fois plus longs que le temps d'exécution moyen, hétérogénéité des ressources est inévitable. Elle apparaît à différents niveaux des systèmes. Au niveau du matériel, plusieurs générations de matériel coexistent dans les infrastructures des clouds. Par conséquent, les utilisateurs n'ont aucun contrôle sur les matériels qui leur sont attribués Ce phénomène est appelé distribution heavy-tail [94]. Il a un impact négatif sur la performance du calcul [131]. Dans le domaine du Big Data, ces tâches nuisibles sont appelées stragglers. Il existe un grand nombre de travaux consacrés à la réduction de la fréquence d'apparition de stragglers Cependant, la variabilité de performance entraine l'apparition de stragglers inattendus. Dans la pratique, il a été démontré que ces stragglers ont un impact majeur sur la performance [131]. En conséquence, la prévention des stragglers est un objectif crucial pour améliorer les performances des grands systèmes de traitement Big Data, p.41 ,