. .. Introduction, 85 7.2 Medical system on cloud federation

. .. , 90 7.3.1 Two phases of generating data storage configuration

. .. , 104 7.4.1 Finding Pareto configuration set

.. .. Conclusion,

, This chapter introduces the detail of a hybrid data storage configuration of the medical system in cloud environment and how to find a good hybrid data storage configuration following the required quality of workload [97]. Section 7.2 shows an overview of Medical Data Management System for a cloud federation and the background of our approach Contents 9, Chapter 5 and 6 introduce two algorithms of estimation accurate cost value, searching and optimizing for Multi-Objective Optimization Problems in a cloud federation, vol.139

. .. Future-works, 142 concludes our work on data management in cloud federations and its application in the medical domain. Section 9.2 summarizes our contributions. Large scale data management is a vast domain that we have very partially addressed. Section 9

, Summary and Conclusion Medical data management in cloud federations raises Multi-Objective Optimization Problems (MOOPs) for query processing and data storage, according to users preferences, such as response time, monetary cost, qualities, etc

A. Regalado, Who Coined 'Cloud Computing, Technology Review. MIT, p.31, 2011.

S. Agrawal, V. Narasayya, and B. Yang, « Integrating vertical and horizontal partitioning into automated physical database design, Proceedings of the 2004 ACM SIGMOD international conference on Management of data -SIGMOD '04, p.359, 2004.

A. Ailamaki, DBMSs on a Modern Processor: Where Does Time Go? », in: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB '99, pp.1-55860, 1999.

M. Akdere, « Learning-based query performance modeling and prediction, International Conference on Data Engineering, pp.390-401, 2012.

Y. Al-dhuraibi, Elasticity in Cloud Computing: State of the Art and Research Challenges, vol.11, p.19391374, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01529654

, Amazon Web Services Website, 2018.

, Apache Cassandra, 2018.

M. Armbrust, « A View of Cloud Computing, Commun. ACM, vol.53, pp.50-58, 2010.

M. Armbrust, Above the Clouds: A Berkeley View of Cloud Computing, 2009.

M. Armbrust, Proceedings of the SIGMOD International Conference on Management of Data, pp.1383-1394, 2015.

L. S. Batista, « Performance Assessment of Multiobjective Evolutionary Algorithms, p.7, 2012.

M. A. Bayir, I. H. Toroslu, and A. Cosar, « Genetic Algorithm for the Multiple-Query Optimization Problem, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol.37, pp.1094-6977, 2007.

A. Beloglazov and R. Buyya, « Energy efficient allocation of virtual machines in cloud data centers, CCGrid 2010 -10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing, pp.577-578, 2010.

P. Bernstein, The Asilomar Report on Database Research, vol.27, pp.163-5808, 1998.

C. T. Leonardo, M. Bezerra, T. López-ibáñez, and . Stützle, « An Empirical Assessment of the Properties of Inverted Generational Distance on Multiand Many-Objective Optimization, Evolutionary Multi-Criterion Optimization, pp.31-45, 2017.

. Bigtable, , 2018.

H. Burton and . Bloom, Space/Time Trade-offs in Hash Coding with Allowable Errors, vol.13, pp.1-0782, 1970.

A. Peter, S. Boncz, M. L. Manegold, and . Kersten, Proceedings of the 25th International Conference on Very Large Data Bases, VLDB '99, pp.1-55860, 1999.

A. Peter, M. Boncz, N. Zukowski, . Nes, and . Monetdb, CIDR 2005, Second Biennial Conference on Innovative Data Systems Research, vol.100, pp.225-237, 2005.

E. A. Boytsov, « Applying stochastic metaheuristics to the problem of data management in a multi-tenant database cluster, Automatic Control and Computer Sciences, 2014.

G. Brassard and P. Bratley, Algorithmics: Theory &Amp; Practice, pp.0-13, 1988.

L. Breiman, « Bagging predictors, Machine Learning, vol.24, pp.123-140, 1996.

F. Bugiotti, Conference on Innovative Data Systems Research (CIDR), 2015.

G. Candea, N. Polyzotis, and R. Vingralek, « Predictable performance and high query concurrency for data analytics, VLDB Journal, vol.20, p.10668888, 2011.

V. Chankong and Y. Y. Haimes, Multiobjective decision making: theory and methodology, North-Holland series in system science and engineering, 1983.

S. Chaudhuri, « An Overview of Query Optimization in Relational Systems, Proceedings of the Symposium on Principles of Database Systems (PODS), pp.34-43, 1998.

J. Chen, The MemSQL Query Optimizer: A Modern Optimizer for Real-time Analytics in a Distributed Database, vol.9, pp.2150-8097, 2016.

P. Ciaccia and D. Martinenghi, « Reconciling Skyline and Ranking Queries, Proc. VLDB Endow, vol.10, pp.2150-8097, 2017.

. Cloud-federation, Computing, p.7, 2011.

C. A. Coello-coello and N. Cruz-cortés, « Solving Multiobjective Optimization Problems Using an Artificial Immune System, Genetic Programming and Evolvable Machines, vol.6, pp.163-190, 2005.

C. A. Coello, D. A. Van-veldhuizen, and G. B. Lamont, Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic and Evolutionary Computation, 2002.

. Couchdb, , 2018.

, DICOM, 2019.

S. Das, D. Agrawal, A. E. Abbadi, and «. Elastras, An Elastic, Scalable, and Self-managing Transactional Database for the Cloud, ACM Trans. Database Syst, vol.38, issue.1, pp.362-5915, 2013.

J. David and . Dewitt, « Split query processing in polybase, Proceedings of the SIGMOD International Conference on Management of Data, pp.1255-1266, 2013.

J. Dean, S. Ghemawat, and . Mapreduce, Simplified Data Processing on Large Clusters, Commun. ACM, vol.51, pp.107-113, 2008.

K. Deb, « Multi-objective optimization using evolutionary algorithms: an introduction, KanGAL Report, pp.1-24, 2011.

K. Deb and R. Bhushan-agrawal, Simulated Binary Crossover for Continuous Search Space, vol.9, pp.1-34, 1994.

K. Deb and H. Jain, « An Evolutionary Many-Objective Optimization Algorithm Using Reference-point Based Non-dominated Sorting Approach, Part I: Solving Problems with Box Constraints, p.18, 2013.

K. Deb, « A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput, vol.6, pp.182-197, 2002.

K. Deb, « Scalable Test Problems for Evolutionary Multiobjective Optimization, Evolutionary Multiobjective Optimization. Theoretical Advances and Applications, pp.105-145, 2005.

K. Doka, « IReS: Intelligent, Multi-Engine Resource Scheduler for Big Data Analytics Workflows, SIGMOD '15, 2015.

T. Dokeroglu, A. Murat, A. Bay?r, and . Cosar, « Integer Linear Programming Solution for the Multiple Query Optimization Problem, Information Sciences and Systems, pp.978-981, 2014.

T. Dokeroglu, A. Murat, A. Bayir, and . Cosar, « Robust Heuristic Algorithms for Exploiting the Common Tasks of Relational Cloud Database Queries, Appl. Soft Comput, vol.30, pp.1568-4946, 2015.

A. Elmore, « A Demonstration of the BigDAWG Polystore System, Proc. VLDB Endow, vol.8, pp.1908-1911, 2015.

F. Färber, Data Management for Modern Business Applications, vol.40, pp.45-51, 2012.

F. Färber, « The SAP HANA Database -An Architecture Overview, IEEE Data Eng. Bull, vol.35, pp.28-33, 2012.

H. M. Fard, A Multi-objective Approach for Workflow Scheduling in Heterogeneous Environments, 2012.

D. B. Fogel, Evolutionary Computation: Toward a New Philosophy of Machine Intelligence (IEEE Press Series on Computational Intelligence), p.471749214, 2006.

C. M. Fonseca and P. J. Fleming, « An Overview of Evolutionary Algorithms in Multiobjective Optimization, Evolutionary Computation, vol.3, pp.1-16, 1995.

, Global Inter-cloud Technology Forum, « Use Cases and Functional Requirements for Inter-Cloud Computing, p.44, 2010.

M. Franklin, A. Halevy, and D. Maier, « From Databases to Dataspaces: A New Abstraction for Information Management, SIGMOD Rec, vol.34, pp.163-5808, 2005.

C. Fung, K. Karlapalem, and Q. Li, « Cost-driven vertical class partitioning for methods in object oriented databases, The VLDB Journal, vol.12, pp.187-210, 2003.

E. Gallinucci, M. Golfarelli, and . Sparktune, tuning Spark SQL through query cost modeling, Advances in Database Technology -22nd International Conference on Extending Database Technology, EDBT 2019, pp.546-549, 2019.

A. Ganapathi, Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning, pp.592-603, 2009.

R. Michael, D. S. Garey, and . Johnson, Computers and Intractability; A Guide to the Theory of NP-Completeness, p.716710455, 1990.

V. Giannakouris, Distributed SQL query execution over multiple engine environments, 2016 IEEE International Conference on Big Data (Big Data), pp.452-461, 2016.

G. Giannikis, G. Alonso, D. Kossmann, and . Shareddb, Killing One Thousand Queries With One Stone, 2012.

J. Giceva, Deployment of Query Plans on Multicores, vol.8, pp.2150-8097, 2014.

C. Glaßer, « Approximability and Hardness in Multi-objective Optimization, Programs, Proofs, Processes, pp.180-189, 2010.

F. Glover and M. Laguna, Tabu Search, p.79239965, 1997.

D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, 1st, p.201157675, 1989.

M. Grund, A Main Memory Hybrid Storage Engine, VLDB Endow, vol.4, pp.105-116, 2010.

. Hbase, , 2018.

A. Richard, J. M. Hankins, . Patel, and . Data, Morphing: An Adaptive, Cacheconscious Storage Technique, Proceedings of the 29th International Conference on Very Large Data Bases, vol.29, pp.417-428, 2003.

S. Harizopoulos, D. J. Abadi, and S. Madden, « Performance Tradeoffs in Read-Optimized Databases, pp.487-498, 2006.

F. Helff and L. Orazio, « Weighted Sum Model for Multi-Objective Query Optimization for Mobile-Cloud Database Environments, EDBT/ICDT Workshops, 2016.

F. S. Hillier and G. J. Lieberman, Introduction to Operations Research, p.816238715, 1986.

S. Huband, « A review of multiobjective test problems and a scalable test problem toolkit, IEEE Transactions on Evolutionary Computation, vol.10, issue.5, pp.1089-778, 2006.

. Hypertable, , 2018.

H. Ishibuchi, H. Masuda, and Y. Nojima, « Sensitivity of performance evaluation results by inverted generational distance to reference points, IEEE Congress on Evolutionary Computation, pp.1107-1114, 2016.

H. Jain and K. Deb, « An evolutionary many-objective optimization algorithm using reference-point based nondominated sorting approach, Part II: Handling constraints and extending to an adaptive approach, IEEE Transactions on Evolutionary Computation, vol.18, pp.602-622, 2014.

M. Karpathiotakis, I. Alagiannis, and A. Ailamaki, « Fast Queries over Heterogeneous Data Through Engine Customization, Proc. VLDB Endow, vol.9, pp.2150-8097, 2016.

G. Keller, Statistics for Management and Economics, Cengage Learning, p.9781133420774, 2014.

S. A. Khan and S. Rehman, « Iterative non-deterministic algorithms in on-shore wind farm design: A brief survey, Renewable and Sustainable Energy Reviews, vol.19, pp.370-384, 2013.

V. Khare, X. Yao, and K. Deb, « Performance Scaling of Multi-objective Evolutionary Algorithms », in: Evolutionary Multi-Criterion Optimization, pp.376-390, 2003.

A. Khoshkbarforoushha, Flower: A Data Analytics Flow Elasticity Manager, pp.1893-1896, 2017.

S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi, and . Neurocomputing, Foundations of Research, pp.0-262, 1988.

H. Kllapi, « Schedule optimization for data processing flows on the cloud, Proceedings of the 2011 international conference on Management of data -SIGMOD '11, p.289, 2011.

K. Kloudas, Optimizing Data Parallel Jobs in Wide-Area Data Analytics, pp.2150-8097, 2014.

J. Knowles and D. Corne, « The Pareto archived evolution strategy: a new baseline algorithm for Pareto multiobjective optimisation, Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol.1, pp.98-105, 1999.

B. Kolev, « CloudMdsQL: querying heterogeneous cloud data stores with a common language, Distributed and Parallel Databases, vol.34, pp.463-503, 2016.

B. Kolev, Proceedings of the SIGMOD International Conference on Management of Data, pp.2113-2116, 2016.

M. Köppen and K. Yoshida, « Substitute Distance Assignments in NSGA-II for Handling Many-objective Optimization Problems, pp.727-741, 2006.

T. V. Kumar and K. Devi, « Frequent queries identification for constructing materialized views, 2011 3rd International Conference on Electronics Computer Technology, vol.6, pp.177-181, 2011.

J. Lefevre, } souping up big data query processing with a multistore system, Proceedings of the SIGMOD International Conference on Management of Data, pp.1591-1602, 2014.

, Jongwuk Lee and Seung won Hwang, « Toward efficient multidimensional subspace skyline computation, VLDB Journal, vol.23, p.10668888, 2014.

V. Leis, « How Good Are Query Optimizers, Really?, Proc. VLDB Endow, vol.9, pp.2150-8097, 2015.

. Leveldb, , 2018.

J. Li, R. Jeffrey-f-naughton, and . Nehme, Resource bricolage and resource selection for parallel database systems, VLDBJ, vol.26, pp.31-54, 2017.

H. Lim, S. Han, and . Babu, « How to Fit when No One Size Fits, 2013.

J. Liu, « Multi-objective scheduling of Scientific Workflows in multisite clouds, Future Generation Computer Systems, vol.63, pp.76-95, 2016.

B. Markines, Evaluating Similarity Measures for Emergent Semantics of Social Tagging, Proceedings of the 18th International Conference on World Wide Web, WWW '09, pp.978-979, 2009.

Z. Michalewicz, How to Solve It: Modern Heuristics 2e, p.9783642061349, 2010.

. Mongodb, , 2018.

N. Cong-danh, «. Workload-and-data, and -. , Automated Design for a Hybrid Row-column Storage Model and Bloom Filter-based Query Processing for Large-scale DICOM Data Management, 2018.

A. Nandi and H. , Guided Interaction: Rethinking the Query-Result Paradigm, vol.4, pp.1466-1469, 2011.

R. E. Neapolitan, K. Naimipour, ;. Lexington, M. C. Usa:-d, . Heath et al., Foundations of Algorithms, pp.0-669, 1996.

, Neo4J, 2018.

D. Nguyen-cong, Storing and Querying DICOM Data with HYTORMO », in: Data Management and Analytics for Medicine and Healthcare, pp.43-61, 2017.

H. Nguyen, Elastic Distributed Resource Scaling for Infrastructureas-a-Service, Proceedings of the 10th International Conference on Autonomic Computing (ICAC 13), pp.978-979, 2013.

T. , « MRShare: sharing across multiple queries in MapReduce, 2010.

J. Frank and . Ohlhorst, Big Data Analytics: Turning Big Data into Big Money, 1st, p.9781118147597, 2012.

A. Oracle and W. Paper, « Performance Evaluation of Storage and Retrieval of DICOM Image Content in Oracle Database 11g Using HP Blade Servers and Intel Processors, 2010.

A. Osyczka, Multicriterion optimization in engineering with FORTRAN programs, p.853124817, 1984.

M. , T. Özsu, and P. Valduriez, Principles of distributed database systems, 2011.

S. Papadomanolakis, A. Ailamaki, and . Autopart, automating schema design for large scientific databases using data partitioning, Proceedings. 16th International Conference on Scientific and Statistical Database Management, pp.383-392, 2004.

Y. Papakonstantinou and ,. Rewriting, The Challenges of Variety, 2016.

Y. Park, J. Min, and K. Shim, « Processing of Probabilistic Skyline Queries Using MapReduce, Proc. VLDB Endow, vol.8, pp.2150-8097, 2015.

J. Pearl, Heuristics: Intelligent Search Strategies for Computer Problem Solving, pp.0-201, 1984.

F. Pedregosa, « Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res, vol.12, pp.1532-4435, 2011.

S. Raschka, Python Machine Learning: Unlock Deeper Insights Into Machine Learning, Community experience distilled, p.9781783555130, 2015.

. Riak, , 2018.

J. Peter, A. M. Rousseeuw, and . Leroy, Robust regression and outlier detection, 1987.

S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, vol.3, p.136042597, 2009.

H. Schwefel, Evolution and Optimum Seeking: The Sixth Generation, p.471571482, 1993.

H. Seada, M. Abouhawwash, and K. Deb, « Towards a Better Balance of Diversity and Convergence in NSGA-III: First Results », in: Evolutionary Multi-Criterion Optimization, pp.545-559, 2017.

J. Shi, Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics, pp.2150-8097, 2015.

S. Sidhanta, W. Golab, S. Mukhopadhyay, and . Optex, A Deadline-Aware Cost Optimization Model for Spark, IEEE/ACM, 2016.

T. Tsu and . Soong, Fundamentals of probability and statistics for engineers, 2004.

N. Srinivas and K. Deb, « Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms, Evolutionary Computation, vol.2, pp.221-248, 1994.

M. Stonebraker, U. Çetintemel, and «. , One Size Fits All": An Idea Whose Time Has Come and Gone, Proceedings of the 21st International Conference on Data Engineering, ICDE '05, pp.2-11, 2005.

M. Stonebraker, « The End of an Architectural Era: (It's Time for a Complete Rewrite), Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB '07, pp.978-979, 2007.

M. Stonebraker, « C-store: A Column-oriented DBMS, International Conference on Very Large Data Bases (VLDB '05), pp.553-564, 2005.

E. Strehl and J. Ghosh, « Value-based Customer Grouping from Large Retail Data-sets, Proceedings of the SPIE Conference on Data Mining and Knowledge Discovery, 2000.

, The Galactica Website, 2018.

H. The and . Website, , 2018.

, The Hive Website, 2018.

M. The and . Website, , 2018.

, The MemSQL Website, 2018.

, The Oracle Website, 2018.

, The PostgreSQL Website, 2018.

S. The and . Website, , 2018.

, The Spark Website, 2018.

. The-sparksql-website, , 2018.

T. The and . Website, , 2018.

. The-weka-website, , 2018.

A. Thusoo, « Hive -{A} Warehousing Solution Over a Map-Reduce Framework, Proceedings of the Very Large Data Bases Endowment (PVLDB), vol.2, pp.1626-1629, 2009.

A. Thusoo, « Hive -a petabyte scale data warehouse using Hadoop, Proceedings of the International Conference on Data Engineering ({ICDE}), pp.996-1005, 2010.

S. Tozer, T. Brecht, A. Aboulnaga, and «. , Avoiding bad query mixes to minimize client timeouts under heavy loads, International Conference on Data Engineering, pp.397-408, 2010.

I. Trummer and C. Koch, « A Fast Randomized Algorithm for Multi-Objective Query Optimization, 2016.

I. Trummer and C. Koch, « Approximation Schemes for Manyobjective Query Optimization, Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pp.978-979, 2014.

I. Trummer, C. Koch, and . Multi, , 2016.

I. Trummer and C. Koch, Multiple Query Optimization on the D-Wave 2X Adiabatic Quantum Computer, vol.9, pp.2150-8097, 2016.

P. Upadhyaya, M. Balazinska, and D. Suciu, « How to Price Shared Optimizations in the Cloud, Proceedings of the VLDB Endowment, vol.5, 2012.

D. Van-aken, Automatic Database Management System Tuning Through Large-scale Machine Learning, Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, pp.978-979, 2017.

D. A. Van-veldhuizen and G. B. Lamont, « Evolutionary Computation and Convergence to a Pareto Front, Late Breaking Papers at the Genetic Programming 1998 Conference, pp.221-228, 1998.

J. Veiga, « Performance evaluation of big data frameworks for large-scale data analytics, 2016 IEEE International Conference on Big Data (Big Data), pp.424-431, 2016.

D. A. Van-veldhuizen and D. A. Van-veldhuizen, Multiobjective Evolutionary Algorithms: Classifications, Analyses, and New Innovations, tech. rep., Evolutionary Computation, 1999.

A. Vicini, « Multipoint transonic airfoil design by means of a multiobjective genetic algorithm, 35th Aerospace Sciences Meeting and Exhibit, p.82, 1997.

W. Wolfson and . Mapreduce, Simplified Data Processing on Large Clusters, Chemistry and Biology, vol.19, p.10745521, 2012.

W. Wu, « Predicting query execution time: Are optimizer cost models really unusable?, IEEE 29th International Conference on Data Engineering (ICDE), 2013.

P. Xiong, F. Drive, Y. Chi, and . Activesla, A Profit-Oriented Admission Control Framework for Database-as-a-Service Providers Categories and Subject Descriptors, 2nd ACM Symposium on Cloud Computing SOCC, vol.11, pp.1-14, 2011.

G. G. Yen and Z. He, Performance Metrics Ensemble for Multiobjective Evolutionary Algorithms, 2013.

W. Yu, « Fast Algorithms for Pareto Optimal Group-based Skyline, Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM '17, pp.978-979, 2017.

S. Zeuch, H. Pirk, and J. Freytag, « Non-invasive progressive optimization for in-memory databases, Proceedings of the VLDB Endowment, vol.9, p.21508097, 2016.

Q. Zhang, H. Li, and «. Moea/d, A Multiobjective Evolutionary Algorithm Based on Decomposition, IEEE Transactions on Evolutionary Computation, vol.11, pp.712-731, 2007.

J. Zhu, « Looking ahead makes query plans robust, Proceedings of the VLDB Endowment, vol.10, p.21508097, 2017.

E. Zitzler, « Performance assessment of multiobjective optimizers: an analysis and review, IEEE Transactions on Evolutionary Computation, vol.7, pp.117-132, 2003.

E. Zitzler, M. Laumanns, L. Thiele, and «. Spea2, Improving the strength Pareto evolutionary algorithm, TIK-report 103, 2001.

M. A. , « TensorFlow : Large-Scale Machine Learning on Heterogeneous Distributed Systems, 2016.