, Result Analysis using a Local Cluster

. .. Astroide-gui,

?. M. Brahem, K. Zeitouni, and L. Yeh, ASTROIDE: A unified astronomical big data processing engine over spark, IEEE Transactions on Big Data, 2018.

?. M. Brahem, K. Zeitouni, and L. Yeh, Efficient astronomical query processing using spark, 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL, 2018.

?. M. Brahem, K. Zeitouni, and L. Yeh, HX-MATCH: In-memory cross-matching algorithm for astronomical big data, Advances in Spatial and Temporal Databases15th International Symposium, p.2017

?. M. Brahem, K. Zeitouni, and L. Yeh, Large scale data management of astronomical surveys with astrospark, 10th Extremely Large Databases Conference, 2017.

?. K. Zeitouni, M. Brahem, and L. Yeh, Large Scale Data Management of Astronomical Surveys with AstroSpark, European Week of Astronomy and Space Science, 2017.

?. M. Brahem, K. Zeitouni, and L. Yeh, Astrospark: towards a distributed data server for big data in astronomy, Proceedings of the 3rd ACM SIGSPATIAL PhD Symposium, 2016.

?. M. Brahem, K. Zeitouni, and L. Yeh, Large scale data management of astronomical surveys with astrospark, Conference on Big Data from Space, 2017.

?. M. Brahem, Adaptative performance optimization for distributed big data server, Journées CNES Jeunes chercheurs, JC2, 2017.

?. M. Brahem, K. Zeitouni, and L. Yeh, Astrospark: towards a distributed data server for big data in astronomy, National Conference, Doctoral Session BDA, 2016.

?. M. Brahem, Astrospark: towards a distributed data server for big data in astronomy, Junior Conference on Data, p.132, 2016.

D. G. York, J. Adelman, J. E. Anderson, S. F. Anderson, J. Annis et al., The sloan digital sky survey: Technical summary, The Astronomical Journal, vol.120, issue.3, p.1579, 2000.

, LSST

, GAIA

N. Astronomy, New worlds, new horizons in astronomy and astrophysics, 2010.

G. B. Berriman and S. L. Groom, How will astronomy archives survive the data tsunami?, Communications of the ACM, vol.54, issue.12, pp.52-56, 2011.

A. S. Szalay, J. Gray, P. Kunszt, A. Thakar, and D. Slutz, Large Databases in Astronomy, pp.99-116, 2001.

, ADQL

M. Stonebraker, The case for shared nothing, IEEE Database Eng. Bull, vol.9, issue.1, pp.4-9, 1986.

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, Spark: Cluster computing with working sets, HotCloud, vol.10, issue.10, p.95, 2010.

A. Eldawy and M. F. Mokbel, Spatialhadoop: A mapreduce framework for spatial data, IEEE 31st International Conference on, pp.1352-1363, 2015.

D. Xie, F. Li, B. Yao, G. Li, L. Zhou et al., Simba: Efficient in-memory spatial analytics, Proceedings of the 2016 International Conference on Management of Data, pp.1071-1085, 2016.

S. Nishimura, S. Das, D. , and A. E. Abbadi, MD-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services, Distributed and Parallel Databases, vol.31, pp.289-319, 2013.

J. Yu, J. Wu, and M. Sarwat, Geospark: A cluster computing framework for processing large-scale spatial data, Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, p.70, 2015.

K. M. Gorski, E. Hivon, A. Banday, B. D. Wandelt, F. K. Hansen et al., HEALPix: A framework for high-resolution discretization and fast analysis of data distributed on the sphere, The Astrophysical Journal, vol.622, issue.2, p.759, 2005.

M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu et al., Spark sql: Relational data processing in spark, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp.1383-1394, 2015.

, Catalyst

J. Gantz and D. Reinsel, Extracting value from chaos, IDC iview, vol.1142, pp.1-12, 2011.

P. Zikopoulos and C. Eaton, Understanding big data: Analytics for enterprise class hadoop and streaming data, 2011.

D. Laney, 3d data management: Controlling data volume, velocity and variety, vol.6, p.1, 2001.

C. P. Chen and C. Zhang, Data-intensive applications, challenges, techniques and technologies: A survey on big data, Information Sciences, vol.275, pp.314-347, 2014.

&. M. Cooper and P. Mell, Tackling Big Data, 2012.

E. F. Codd, The relational model for database management: version 2, 1990.

M. Chen, S. Mao, and Y. Liu, Big data: A survey, Mobile networks and applications, vol.19, pp.171-209, 2014.

S. Mazumder, R. S. Bhadoria, and G. C. Deka, Distributed Computing in Big Data Analytics: Concepts, Technologies and Applications, 2017.

J. Dean and S. Ghemawat, Mapreduce: simplified data processing on large clusters, Communications of the ACM, vol.51, issue.1, p.134, 2008.

, Hadoop

K. Shvachko, H. Kuang, S. Radia, and R. Chansler, The hadoop distributed file system, Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on, pp.1-10, 2010.

, Amazon S3

T. White, Hadoop: The definitive guide, 2012.

, Hive

, Pig

S. Gilbert and N. Lynch, Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, Acm Sigact News, vol.33, issue.2, pp.51-59, 2002.

N. Leavitt, Will nosql databases live up to their promise?, Computer, vol.43, issue.2, 2010.

, SimpleDB

, MongoDB

, Neo4j

&. Cassandra,

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma et al., Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp.2-2, 2012.

P. Zecevic and M. Bonaci, Spark in Action, 2016.

H. Karau and R. Warren, High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark, 2017.

R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker et al., Shark: Sql and rich analytics at scale, Proceedings of the 2013 ACM SIGMOD International Conference on Management of data, pp.13-24, 2013.

M. Frampton, Mastering Apache Spark, 2015.

, Cost based optimizer in Apache Spark, 2017.

D. Shabalin, E. Burmako, and M. Odersky, Quasiquotes for scala, 2013.

A. Mesmoudi, M. Hacid, and F. Toumani, Benchmarking SQL on MapReduce systems using large astronomy databases, Distributed and Parallel Databases, vol.34, pp.347-378, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01221665

A. A. Goodman and C. G. Wong, Bringing the night sky closer: Discoveries in the data deluge, The Fourth Paradigm: Data-Intensive Scientific Discovery, pp.39-44, 2009.

T. Hey, S. Tansley, and K. M. Tolle, The fourth paradigm: data-intensive scientific discovery, vol.1, 2009.

G. Bell, T. Hey, and A. Szalay, Beyond the data deluge, Science, vol.323, issue.5919, pp.1297-1298, 2009.

J. Gray, D. Slutz, A. Szalay, and A. Thakar, Jan vandenberg, peter kunszt, and chris stoughton. data mining the sdss skyserver database, tech. rep, 2002.

, SkyServer

V. Singh, J. Gray, A. Thakar, A. S. Szalay, J. Raddick et al., Skyserver traffic report -the first five years, CoRR, 2006.

, SkyServer Traffic

F. Ochsenbein, P. Bauer, and J. Marcout, The VizieR database of astronomical catalogues, Astronomy and Astrophysics Supplement Series, vol.143, issue.1, pp.23-32, 2000.

S. Derriere, F. Ochsenbein, and D. Egret, On-line access to very large catalogues, Astronomical Data Analysis Software and Systems IX, vol.216, p.235, 2000.

S. Koposov and O. Bartunov, Q3C, Quad Tree Cube-the new sky-indexing concept for huge astronomical catalogues and its realization for main astronomical queries (cone search and Xmatch) in open source database PostgreSQL, Astronomical Data Analysis Software and Systems XV, vol.351, p.735, 2006.

M. A. Nieto-santisteban, A. R. Thakar, and A. S. Szalay, Cross-matching very large datasets, National Science and Technology Council (NSTC) NASA Conference, 2007.

T. Budavári, A. Szalay, J. Gray, W. O'mullane, R. Williams et al., Open skyquery-vo compliant dynamic federation of astronomical archives, Astronomical Data Analysis Software and Systems (ADASS) XIII, vol.314, p.177, 2004.

M. Ivanova, N. Nes, R. Goncalves, and M. Kersten, Monetdb/sql meets skyserver: the challenges of a scientific database, Scientific and Statistical Database Management, 2007. SSBDM'07. 19th International Conference on, pp.13-13, 2007.

S. Idreos, F. Groffen, N. Nes, S. Manegold, S. Mullender et al., Monetdb: Two decades of research in column-oriented database, 2012.

J. Vanderplas, E. Soroush, K. S. Krughoff, M. Balazinska, and A. Connolly, Squeezing a Big Orange into Little Boxes: The AscotDB System for Parallel Processing of Data on a Sphere, IEEE Data Eng. Bull, vol.36, issue.4, pp.11-20, 2013.

, SciDB

D. Marcos, A. Connolly, K. Krughoff, I. Smith, and S. Wallace, Ascot: a collaborative platform for the virtual observatory, Astronomical Data Analysis Software and Systems XXI, vol.461, p.901, 2012.

D. L. Wang, S. M. Monkewitz, K. Lim, and J. Becla, Qserv: A distributed sharednothing database for the lsst catalog, p.12, 2011.

A. Dorigo, P. Elmer, F. Furano, and A. Hanushevsky, Xrootd-a highly scalable architecture for data access, WSEAS Transactions on Computers, vol.1, issue.4, 2005.

J. Gray, M. A. Nieto-santisteban, and A. S. Szalay, The zones algorithm for finding points-near-a-point or cross-matching spatial datasets, CoRR, 2006.

F. Pineau, T. Boch, and S. Derriere, Efficient and scalable cross-matching of (very) large catalogs, Astronomical Data Analysis Software and Systems XX, vol.442, p.85, 2011.

Q. Zhao, J. Sun, C. Yu, C. Cui, L. Lv et al., A paralleled large-scale astronomical cross-matching function, International Conference on Algorithms and Architectures for Parallel Processing, pp.604-614, 2009.

, AstroLab Software

A. Aji, X. Sun, H. Vo, Q. Liu, R. Lee et al., Demonstration of hadoop-gis: a spatial data warehousing system over mapreduce, Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp.528-531, 2013.

A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu et al., Hadoop gis: a high performance spatial data warehousing system over mapreduce, Proceedings of the VLDB Endowment, vol.6, pp.1009-1020, 2013.

H. Vo, A. Aji, and F. Wang, Sato: a spatial data partitioning framework for scalable query processing, Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp.545-548, 2014.

N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Acm Sigmod Record, vol.19, pp.322-331, 1990.

T. Sellis, N. Roussopoulos, and C. Faloutsos, The R+-Tree: A Dynamic Index for Multi-Dimensional Objects, 1987.

A. Eldawy and M. F. Mokbel, Pigeon: A spatial mapreduce language, 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp.1242-1245, 2014.

G. M. Morton, A computer oriented geodetic data base and a new technique in file sequencing, 1966.

M. Tang, Y. Yu, Q. M. Malluhi, M. Ouzzani, and W. G. Aref, Locationspark: a distributed in-memory data management system for big spatial data, Proceedings of the VLDB Endowment, vol.9, pp.1565-1568, 2016.

A. Eldawy and M. F. Mokbel, The Era of Big Spatial Data, Proc. VLDB Endow, vol.10, issue.12, 1992.

V. Pandey, A. Kipf, T. Neumann, and A. Kemper, How good are modern spatial analytics systems?, Proceedings of the VLDB Endowment, vol.11, pp.1661-1673, 2018.

M. T. Özsu and P. Valduriez, Principles of distributed database systems, 2011.

J. Widom, H. Garcia-molina, and J. D. Ullman, Database systems the complete book, 2009.

S. K. Singh, Database systems: Concepts, design and applications. Pearson Education India, 2011.

G. Mantelet, ADQL library

T. Ibaraki and T. Kameda, On the optimal nesting order for computing n-relational joins, ACM Transactions on Database Systems (TODS), vol.9, issue.3, pp.482-502, 1984.

S. T. Shenoy and Z. M. Ozsoyoglu, Design and implementation of a semantic query optimizer, IEEE transactions on Knowledge and data Engineering, vol.1, issue.3, pp.344-361, 1989.

R. Elmasri and S. Navathe, Fundamentals of database systems, 2010.

M. Steinbrunn, G. Moerkotte, and A. Kemper, Heuristic and randomized optimization for the join ordering problem, The VLDB Journal-The International Journal on Very Large Data Bases, vol.6, issue.3, pp.191-208, 1997.

D. Kossmann, The state of the art in distributed query processing, ACM Computing Surveys (CSUR), vol.32, issue.4, pp.422-469, 2000.

S. Chaudhuri, An overview of query optimization in relational systems, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pp.34-43, 1998.

E. Begoli, J. Camacho-rodríguez, J. Hyde, M. J. Mior, and D. Lemire, Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources, Proceedings of the 2018 International Conference on Management of Data, pp.221-230, 2018.

, Calcite

M. Golfarelli and L. Baldacci, A cost model for spark sql, IEEE Transactions on Knowledge and Data Engineering, 2018.

J. J. Mwemezi and Y. Huang, Optimal facility location on spherical surfaces: algorithm and application, New York Science Journal, vol.4, issue.7, pp.21-28, 2011.

R. Plante, R. Williams, R. Hanisch, and A. Szalay, Simple cone search version 1.03, IVOA Recommendation, 2008.

, IVOA

M. Brahem, S. Lopes, L. Yeh, and K. Zeitouni, AstroSpark: towards a distributed data server for big data in astronomy, Proceedings of the 3rd ACM SIGSPATIAL PhD Symposium, 2016.

M. Brahem, K. Zeitouni, and L. Yeh, Astroide: A unified astronomical big data processing engine over spark, IEEE Transactions on Big Data, 2018.

M. Brahem, K. Zeitouni, and L. Yeh, Efficient astronomical query processing using spark, 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2018.

H. Herodotou, N. Borisov, and S. Babu, Query optimization techniques for partitioned tables, Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp.49-60, 2011.

A. S. Szalay, J. Gray, G. Fekete, P. Z. Kunszt, P. Kukol et al., Indexing the sphere with the hierarchical triangular mesh, CoRR, 2005.

P. Kunszt, A. Szalay, I. Csabai, and A. Thakar, The indexing of the sdss science archive, Astronomical Data Analysis Software and Systems IX, vol.216, p.141, 2000.

R. W. Youngren and M. D. Petty, A multi-resolution HEALPix data structure for spherically mapped point data, Heliyon, vol.3, issue.6, p.332, 2017.

P. Fernique, M. Allen, T. Boch, A. Oberto, F. Pineau et al., Hierarchical progressive surveys-Multiresolution HEALPix data structures for astronomical images, catalogues, and 3-dimensional data cubes, Astronomy & Astrophysics, vol.578, p.114, 2015.

J. A. Orenstein, Spatial query processing in an object-oriented database system, ACM Sigmod Record, vol.15, pp.326-336, 1986.

W. O'mullane, A. Banday, K. Gorski, P. Kunszt, and A. Szalay, Splitting the sky-htm and healpix, pp.638-648, 2000.

. "healpix-softaware,

M. A. Nieto-santisteban, A. R. Thakar, A. S. Szalay, and J. Gray, Large-scale query and xmatch, entering the parallel zone, Astronomical Data Analysis Software and Systems XV, vol.351, p.493, 2006.

J. L. Bentley, Multidimensional binary search trees used for associative searching, Communications of the ACM, vol.18, issue.9, pp.509-517, 1975.

D. Gao, Y. Zhang, and Y. Zhao, The Application of kd-tree in Astronomy, Astronomical Data Analysis Software and Systems XVII, Astronomical Society of the Pacific Conference Series, 2008.

A. Eldawy, L. Alarabi, and M. F. Mokbel, Spatial partitioning techniques in spatialhadoop, Proceedings of the VLDB Endowment, vol.8, pp.1602-1605, 2015.

H. Vo, A. Aji, and F. Wang, Sato: A spatial data partitioning framework for scalable query processing, Proceedings of the 22Nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL '14, 2014.

H. Singh and S. Bawa, A survey of traditional and mapreducebased spatial query processing approaches, ACM SIGMOD Record, vol.46, issue.2, pp.18-29, 2017.

W. Wang, J. Yang, and R. Muntz, Sting: A statistical information grid approach to spatial data mining, VLDB, vol.97, pp.186-195, 1997.

M. F. Mokbel, W. G. Aref, and I. Kamel, Performance of multi-dimensional spacefilling curves, Proceedings of the 10th ACM international symposium on Advances in geographic information systems, pp.149-154, 2002.

J. K. Lawder and P. J. King, Querying multi-dimensional data indexed using the hilbert space-filling curve, ACM Sigmod Record, vol.30, issue.1, pp.19-24, 2001.

B. Yao, F. Li, and P. Kumar, K nearest neighbor queries and knn-joins in large relational databases (almost) for free, Data engineering (ICDE), 2010 IEEE 26th international conference on, pp.4-15, 2010.

D. Hilbert, Über die stetige abbildung einer linie auf ein flächenstück, Dritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes, pp.1-2, 1935.

B. Moon, H. V. Jagadish, C. Faloutsos, and J. H. Saltz, Analysis of the clustering properties of the hilbert space-filling curve, IEEE Transactions on knowledge and data engineering, vol.13, issue.1, pp.124-141, 2001.

A. Guttman, R-trees: A dynamic index structure for spatial searching, vol.14, 1984.

T. Sellis, N. Roussopoulos, and C. Faloutsos, The r+-tree: A dynamic index for multi-dimensional objects, 1987.

, Aladin

S. T. Leutenegger, M. A. Lopez, and J. Edgington, STR: A simple and efficient algorithm for R-tree packing, Data Engineering, 1997. Proceedings. 13th international conference on, pp.497-506, 1997.

J. A. Orenstein and F. A. Manola, Probe spatial data modeling and query processing in an image database application, IEEE transactions on Software Engineering, vol.14, issue.5, pp.611-629, 1988.

R. Ramakrishnan and J. Gehrke, Database management systems, 2000.

M. Brahem, K. Zeitouni, and L. Yeh, HX-MATCH: In-Memory Cross-Matching Algorithm for Astronomical Big Data, International Symposium on Spatial and Temporal Databases, pp.411-415, 2017.

&. Galactica,

A. G. Brown, A. Vallenari, T. Prusti, J. De-bruijne, F. Mignard et al., Gaia Data Release 1-Summary of the astrometric, photometric, and survey properties, Astronomy & Astrophysics, vol.595, p.2, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01407396

, IGSL

I. A. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A. Gani et al., The rise of "big data" on cloud computing: Review and open research issues, Information Systems, vol.47, pp.98-115, 2015.

, OpenStack

M. Taylor, Topcat: Desktop exploration of tabular data for astronomy and beyond, in Informatics, vol.4, p.18, 2017.

M. B. Taylor, Stilts-a package for command-line processing of tabular data, Astronomical Data Analysis Software and Systems XV, vol.351, p.666, 2006.

J. Yu, Z. Zhang, and M. Sarwat, Spatial data management in apache spark: the geospark perspective and beyond, GeoInformatica, p.142, 2018.

M. Tang, Y. Yu, W. Aref, A. Mahmood, Q. Malluhi et al., In-memory distributed spatial query processing and optimization, 2016.

, Grunion

J. Peloton, C. Arnault, and S. Plaszczynski, Fits data source for apache spark, Computing and Software for Big Science, vol.2, issue.1, p.7, 2018.
DOI : 10.1007/s41781-018-0014-z

URL : http://arxiv.org/pdf/1804.07501