Y. Demchenko, P. Grosso, C. De-laat, and P. Membrey, Addressing big data issues in scientific data infrastructure, Collaboration Technologies and Systems (CTS), 2013 International Conference on, pp.48-55, 2013.

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma et al., Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, pp.15-28, 2012.

M. Ester, H. Kriegel, J. Sander, and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), pp.226-231, 1996.

C. Charu, C. K. Aggarwal, and . Reddy, Data Clustering: Algorithms and Applications, 2014.

K. Udommanetanakit, T. Rakthanmanon, and K. Waiyamai,

. E-stream, Evolution-based technique for stream clustering, ADMA, pp.605-615, 2007.

X. Zhang, C. Furtlehner, and M. Sebag, Data streaming with affinity propagation, ECML/PKDD, pp.628-643, 2008.
DOI : 10.1007/978-3-540-87481-2_41

URL : https://hal.archives-ouvertes.fr/inria-00289679

N. Marz and J. Warren, Big Data: Principles and best practices of scalable realtime data systems, 2015.

J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques, 2011.

X. Zhang, Contributions to Large Scale Data Clustering and Streaming with Affinity Propagation. Application to Autonomic Grids, 2010.

J. R. Mashey, Big data and the next wave of infrastress problems, solutions, opportunities, 1998.

W. Fan and A. Bifet, Mining big data: current status, and forecast to the future, ACM sIGKDD Explorations Newsletter, vol.14, issue.2, pp.1-5, 2013.

D. Laney, 3D data management: Controlling data volume, velocity, and variety, 2001.

J. Gantz and D. Reinsel, Extracting value from chaos, IDC iview, vol.1142, pp.1-12, 2011.

S. Ghemawat, H. Gobioff, and S. Leung, The google file system, ACM SIGOPS operating systems review, vol.37, pp.29-43, 2003.
DOI : 10.1145/1165389.945450

M. Burrows, The chubby lock service for loosely-coupled distributed systems, Proceedings of the 7th symposium on Operating systems design and implementation, pp.335-350, 2006.

D. Borthakur, The hadoop distributed file system: Architecture and design, vol.11, p.21, 2007.

J. Dean and S. Ghemawat, Mapreduce: simplified data processing on large clusters, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, Spark: Cluster computing with working sets, Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, pp.10-10, 2010.

X. Liu, N. Iftikhar, and X. Xie, Survey of real-time processing systems for big data, 18th International Database Engineering & Applications Symposium, pp.356-361, 2014.

M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica, Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters, Proceedings of the 4th USENIX Conference on Hot Topics Bibliography, p.169

. In-cloud-ccomputing, HotCloud'12, pp.10-10, 2012.

M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker et al., Discretized streams: fault-tolerant streaming computation at scale

, ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP '13, pp.423-438, 2013.

M. Balazinska, H. Balakrishnan, S. Madden, and M. Stonebraker, Fault-tolerance in the borealis distributed stream processing system, ACM Trans. Database Syst, vol.33, issue.1, 2008.

J. Hwang, M. Balazinska, A. Rasin, M. Stonebraker, and S. B. Zdonik, High-availability algorithms for distributed stream processing, Proceedings of the 21st International Conference on Data Engineering, pp.779-790, 2005.

M. Ghesmoune, M. Lebbah, and H. Azzag, Micro-batching growing neural gas for clustering data streams using spark streaming, INNS Conference on Big Data, pp.158-166, 2015.
DOI : 10.1016/j.procs.2015.07.290

URL : https://doi.org/10.1016/j.procs.2015.07.290

X. Meng, J. K. Bradley, B. Yavuz, E. R. Sparks, S. Venkataraman et al., Matei Zaharia, and Ameet Talwalkar. Mllib: Machine learning in apache spark, 2015.

C. Charu, T. J. Aggarwal, R. Watson, J. Ctr, J. Han et al., A framework for clustering evolving data streams, VLDB, pp.81-92, 2003.

M. R. Ackermann, M. Märtens, C. Raupach, K. Swierkot, C. Lammersen et al., StreamKM++: A clustering algorithm for data streams, ACM Journal of Experimental Algorithmics, vol.17, issue.1, 2012.
DOI : 10.1137/1.9781611972900.16

S. Schelter, S. Ewen, K. Tzoumas, and V. Markl, all roads lead to rome": optimistic recovery for distributed iterative data processing, Bibliography 22nd ACM International Conference on Information and Knowledge Management, CIKM'13, pp.1919-1928, 2013.

A. Bifet, G. Holmes, B. Pfahringer, P. Kranen, H. Kremer et al., MOA: massive online analysis, a framework for stream classification and clustering, Proceedings of the First Workshop on Applications of Pattern Analysis, WAPA 2010, pp.44-50, 2010.

P. Kranen, I. Assent, C. Baldauf, and T. Seidl, The ClusTree: indexing micro-clusters for anytime stream mining, Knowledge and information systems, vol.29, issue.2, pp.249-272, 2011.

F. Cao, M. Ester, W. Qian, and A. Zhou, Density-based clustering over an evolving data stream with noise, SDM, pp.328-339, 2006.

Y. Chen and L. Tu, Density-based clustering for real-time stream data, Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.133-142, 2007.

G. De, F. Morales, and A. Bifet, SAMOA: scalable advanced massive online analysis, Journal of Machine Learning Research, vol.16, pp.149-153, 2015.

K. Anil, R. C. Jain, and . Dubes, Algorithms for Clustering Data, 1988.

D. Arthur and S. Vassilvitskii, k-means++: the advantages of careful seeding, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp.1027-1035, 2007.

B. Bahmani, B. Moseley, A. Vattani, R. Kumar, and S. Vassilvitskii, Scalable k-means++, Proceedings of the VLDB Endowment, vol.5, pp.622-633, 2012.

S. Kaski, J. Kangas, and T. Kohonen, Bibliography of self-organizing map (som) papers: 1981-1997, Neural computing surveys, vol.1, p.171, 1998.

S. Haykin, Neural Networks: A Comprehensive Foundation, vol.0132733501, 1998.

T. Kohonen, M. R. Schroeder, and T. S. Huang, Self-Organizing Maps

T. Martinetz and K. Schulten, A "Neural-Gas, Network Learns Topologies. Artificial Neural Networks, vol.I, pp.397-402, 1991.

B. Fritzke, Unsupervised clustering with growing cell structures, Proceedings of the International Joint Conference on Neural Networks, pp.531-536

, IEEE, 1991.

B. Fritzke, A growing neural gas network learns topologies, NIPS, pp.625-632, 1994.

O. Beyer and P. Cimiano, Online semi-supervised growing neural gas, Int. J. Neural Syst, vol.22, issue.5, 2012.

J. Brendan, D. Frey, and . Dueck, Clustering by passing messages between data points. science, vol.315, pp.972-976, 2007.

A. Amini, Y. W. Teh, and H. Saboohi, On density-based data streams clustering algorithms: A survey, J. Comput. Sci. Technol, vol.29, issue.1, pp.116-141, 2014.

P. Berkhin, A survey of clustering data mining techniques, Grouping multidimensional data, pp.25-71, 2006.

C. Fraley and A. E. Raftery, How many clusters? which clustering method? answers via model-based cluster analysis, The computer journal, vol.41, issue.8, pp.578-588, 1998.

P. Arthur, N. M. Dempster, D. Laird, and . Rubin, Maximum likelihood from incomplete data via the em algorithm, Journal of the royal statistical society. Series B (methodological), pp.1-38, 1977.

G. Mclachlan and T. Krishnan, The EM algorithm and extensions, vol.382, 2007.

A. E. Attar, A. Pigeau, and M. Gelgon, Robust estimation of a global gaussian mixture by decentralized aggregations of local models, Web Intelligence Bibliography and Agent Systems, vol.11, issue.3, pp.245-262, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00794452

A. E. and A. , Estimation robuste des modèles de mélange sur des données distribuées. Theses, 2012.

R. Lämmel, Google's mapreduce programming model-revisited, Science of computer programming, vol.70, issue.1, pp.1-30, 2008.

E. Januzaj, H. Kriegel, and M. Pfeifle, Dbdc: Density based distributed clustering, Advances in Database Technology-EDBT 2004, pp.88-105, 2004.

T. Sarazin, H. Azzag, and M. Lebbah, SOM clustering using spark-mapreduce, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.1727-1734, 2014.

W. Zhao, H. Ma, and Q. He, Parallel k-means clustering based on mapreduce, Cloud computing, pp.674-679, 2009.

K. Shvachko, H. Kuang, S. Radia, and R. Chansler, The hadoop distributed file system, Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pp.1-10, 2010.

Y. He, H. Tan, and W. Luo, Mrdbscan: a scalable mapreduce-based dbscan algorithm for heavily skewed data, Shengzhong Feng, and Jianping Fan, vol.8, pp.83-99, 2014.

A. Seyed-shirkhorshidi, S. Aghabozorgi, Y. Teh, T. Wah, and . Herawan, Big data clustering: a review, Computational Science and Its Applications-ICCSA 2014, pp.707-720, 2014.

S. Abhinandan, M. Das, A. Datar, S. Garg, and . Rajaram, Google news personalization: scalable online collaborative filtering, Proceedings of the 16th international conference on World Wide Web, pp.271-280, 2007.

H. Cui, J. Wei, and W. Dai, Parallel implementation of expectationmaximization for fast convergence

I. Bibliography-aniruddha-basak, O. J. Brinster, and . Mengshoel, Mapreduce for bayesian network parameter learning using the em algorithm, Proc. of Big Learning: Algorithms, Systems and Tools, 2012.

A. Ene, S. Im, and B. Moseley, Fast clustering using mapreduce, Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.681-689, 2011.
DOI : 10.1145/2020408.2020515

URL : http://arxiv.org/pdf/1109.1579.pdf

C. Charu and . Aggarwal, A survey of stream clustering algorithms, Data Clustering: Algorithms and Applications, pp.231-258, 2013.

H. Nguyen, Yew-Kwong Woon, and Wee Keong Ng. A survey on data stream clustering and classification, Knowl. Inf. Syst, vol.45, issue.3, pp.535-569, 2015.

M. Khalilian and N. Mustapha, Data stream clustering: Challenges and issues. CoRR, abs/1006, vol.5261, 2010.

D. Yogita, Clustering techniques for streaming data-a survey, Advance Computing Conference (IACC), 2013 IEEE 3rd International, pp.951-956, 2013.
DOI : 10.1109/iadcc.2013.6514355

M. Mousavi, A. Azuraliza, M. Bakar, and . Vakilian, Data stream clustering algorithms: A review, International Journal of Advances in Soft Computing & Its Applications, vol.7, issue.3, 2015.

J. De, A. Silva, E. R. Faria, R. C. Barros, E. R. Hruschka et al., Data stream clustering: A survey, ACM Comput. Surv, vol.46, issue.1, p.13, 2013.

L. Golab and M. Tamerözsu, Issues in data stream management, ACM Sigmod Record, vol.32, issue.2, pp.5-14, 2003.
DOI : 10.1145/776985.776986

URL : http://www.cs.virginia.edu/~son/cs851/stream/papers/p5-golab.pdf

M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy, Mining data streams: a review, ACM Sigmod Record, vol.34, issue.2, pp.18-26, 2005.
DOI : 10.1145/1083784.1083789

Y. Zhu and D. Shasha, Statstream: Statistical monitoring of thousands of data streams in real time, Proceedings of the 28th international conference on Very Large Data Bases, pp.358-369, 2002.

A. Metwally, D. Agrawal, and A. E. Abbadi, Duplicate detection in click streams, Proceedings of the 14th international conference on World Wide Web, pp.12-21, 2005.
DOI : 10.1145/1060745.1060753

C. Charu and . Aggarwal, Data streams: models and algorithms, vol.31, 2007.

C. Charu and . Aggarwal, A framework for diagnosing changes in evolving data streams, Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp.575-586, 2003.

D. Kifer, S. Ben-david, and J. Gehrke, Detecting change in data streams, Proceedings of the Thirtieth international conference on Very large data bases, vol.30, pp.180-191, 2004.

D. Donjerkovic, E. Yannis, R. Ioannidis, and . Ramakrishnan, Dynamic histograms: Capturing evolving data sets, Proceedings of the international conference on data engineering, pp.86-86
DOI : 10.1109/icde.2000.839394

URL : http://www.cs.wisc.edu/~donjerko/hist.pdf

V. Ganti, J. Gehrke, and R. Ramakrishnan, Mining data streams under block evolution, ACM SIGKDD Explorations Newsletter, vol.3, issue.2, pp.1-10, 2002.
DOI : 10.1145/507515.507517

T. Zhang, R. Ramakrishnan, and M. Livny, BIRCH: An efficient data clustering method for very large databases, SIGMOD Conference, pp.103-114, 1996.

W. Meesuksabai, T. Kangkachit, and K. Waiyamai, Hue-stream: Evolution-based clustering technique for heterogeneous data streams with uncertainty, Advanced Data Mining and Applications -7th International Conference, pp.27-40, 2011.

C. Charu, P. S. Aggarwal, and . Yu, A framework for clustering uncertain data streams, Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, pp.150-159, 2008.

C. Yang and J. Zhou, Hclustream: A novel approach for clustering evolving heterogeneous data stream, Workshops Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), pp.682-688, 2006.

A. Guttman, R-trees: A dynamic index structure for spatial searching, SIGMOD'84, Proceedings of Annual Meeting, pp.47-57, 1984.

J. Peter-patist, W. Kowalczyk, and E. Marchiori, Maintaining gaussian mixture models of data streams under block evolution, International Conference on Computational Science, pp.1071-1074, 2006.

A. Samé and H. E. Assaad, A state-space approach to modeling functional time series application to rail supervision, 22nd European Signal Processing Conference, pp.1402-1406, 2014.

E. L. Hani and . Assaad, Dynamic classification and modeling of non-stationary temporal data. Theses, 2014.

C. Isaksson, M. H. Dunham, and M. Hahsler, SOStream: Self organizing density-based clustering over data stream, MLDM, pp.264-278, 2012.
DOI : 10.1007/978-3-642-31537-4_21

C. Wang, J. Lai, D. Huang, and W. Zheng, SVStream: A support vector-based algorithm for clustering data streams
DOI : 10.1109/tkde.2011.263

, Knowl. Data Eng, vol.25, issue.6, pp.1410-1424, 2013.

A. Ben-hur, D. Horn, T. Hava, V. Siegelmann, and . Vapnik, Support vector clustering, Journal of Machine Learning Research, vol.2, pp.125-137, 2001.

M. J. David, . Tax, P. W. Robert, and . Duin, Support vector domain description, Pattern Recognition Letters, vol.20, pp.1191-1199, 1999.

J. Isaac, J. M. Sledge, and . Keller, Growing neural gas for temporal clustering, 19th International Conference on Pattern Recognition (ICPR 2008), pp.1-4, 2008.

C. Mendes, M. Gattass, and H. Lopes, Fgng: A fast multi-dimensional growing neural gas implementation, Neurocomputing, vol.128, pp.328-340, 2014.

. Sv-mitsyn and . Ososkov, The growing neural gas and clustering of large amounts of data, Optical Memory and Neural Networks, vol.20, issue.4, pp.260-270, 2011.

B. Fritzke, A self-organizing network that can follow non-stationary distributions, Artificial Neural Networks -ICANN '97, 7th International Conference, pp.613-618, 1997.

S. Marsland, J. Shapiro, and U. Nehmzow, A self-organising network that grows when required, Neural Networks, vol.15, issue.8-9, pp.1041-1058, 2002.

S. Marsland, U. Nehmzow, and J. Shapiro, On-line novelty detection for autonomous mobile robots, Robotics and Autonomous Systems, vol.51, issue.2, pp.191-206, 2005.

Y. Prudent and A. Ennaji, An incremental growing neural gas learns topologies, Neural Networks, 2005. IJCNN'05. Proceedings. 2005 IEEE International Joint Conference on, vol.2, pp.1211-1216, 2005.
DOI : 10.1109/ijcnn.2005.1556026

H. Hamza, Y. Belaïd, A. Belaïd, B. Bidyut, and . Chaudhuri, Incremental classification of invoice documents, Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, pp.1-4, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00346942

J. Lamirel, Z. Boulila, M. Ghribi, and P. Cuxac, A new incremental growing neural gas algorithm based on clusters labeling maximization: application to clustering of heterogeneous textual data, Trends in Applied Intelligent Systems, pp.139-148, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00535942

J. García-rodríguez, A. Angelopoulou, J. M. García-chamizo, A. Psarrou, S. O. Escolano et al., Autonomous growing neural gas for applications with time constraint: optimal parameter estimation, Neural Networks, vol.32, pp.196-208, 2012.

A. F. Marco, D. A. Pimentel, L. Clifton, L. Clifton, and . Tarassenko, A review of novelty detection, Signal Processing, vol.99, pp.215-249, 2014.

M. Bouguelia, Y. Belaïd, and A. Belaïd, An adaptive incremental clustering method based on the growing neural gas algorithm, ICPRAM, pp.42-49, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00794354

S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. Callaghan, Clustering data streams: Theory and practice. Knowledge and Data Engineering, IEEE Transactions on, vol.15, issue.3, pp.515-528, 2003.
DOI : 10.1109/tkde.2003.1198387

. Xingquan, Stream data mining repository (web site), 2010.

K. Bache and M. Lichman, UCI machine learning repository, 2013.

A. Strehl and J. Ghosh, Cluster ensembles -a knowledge reuse framework for combining multiple partitions, Journal of Machine Learning Research, vol.3, pp.583-617, 2002.

M. Bolanos, J. Forrest, and M. Hahsler, stream: Infrastructure for Data Stream Mining, 2014.

A. Zhou, F. Cao, W. Qian, and C. Jin, Tracking clusters in evolving data streams over sliding windows, Knowl. Inf. Syst, vol.15, issue.2, pp.181-214, 2008.

M. Ghesmoune, H. Azzag, and M. Lebbah, G-stream: Growing neural gas over data stream, Neural Information Processing -21st International Conference, pp.207-214, 2014.

M. Ghesmoune, M. Lebbah, and H. Azzag, Clustering over data streams based on growing neural gas, Advances in Knowledge Discovery and Data Mining -19th Pacific-Asia Conference, PAKDD 2015, pp.134-145, 2015.

S. Benbernou, X. Huang, and M. Ouziri, Fusion of big RDF data: A semantic entity resolution and query rewriting-based inference approach, Web Information Systems Engineering -WISE 2015 -16th International Conference, pp.300-307, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01377590

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2009.

T. Therneau, B. Atkinson, and B. Ripley, rpart: Recursive Partitioning and Regression Trees, 2015.

H. Azzag, G. Venturini, A. Oliver, and C. Guinot, A hierarchical ant based clustering algorithm and its use in three real-world applications
URL : https://hal.archives-ouvertes.fr/hal-01020927

, European Journal of Operational Research, vol.179, issue.3, pp.906-922, 2007.

N. Doan, H. Azzag, and M. Lebbah, Growing selforganizing trees for knowledge discovery from data, The 2012 International Joint Conference on Neural Networks (IJCNN), pp.1-8, 2012.

N. Doan, H. Azzag, M. Lebbah, and G. Santini, Self-organizing trees for visualizing protein dataset, The 2013 International Joint Conference on Neural Networks, IJCNN 2013, pp.1-8, 2013.

D. Auber, Tulip : A huge graph visualisation framework, Graph Drawing Softwares, Mathematics and Visualization, pp.105-126, 2003.
URL : https://hal.archives-ouvertes.fr/hal-00307626

C. Ordonez, Clustering binary data streams with k-means, Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pp.12-19, 2003.

B. Babcock, M. Datar, R. Motwani, and L. Callaghan, Maintaining variance and k-medians over data stream windows, Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of

, Database Systems, pp.234-243, 2003.

M. Charikar, O. Liadan, R. Callaghan, and . Panigrahy, Better streaming algorithms for clustering problems, Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, pp.30-39, 2003.
DOI : 10.1145/780542.780548

M. Lebbah, Carte topologique pour données qualitatives: applicationà la reconnaissance automatique de la densité du trafic routier. Master's thesis, 2003.

G. Govaert, Data Analysis, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00447855

G. Krempl, D. Indre?liobaite, E. Brzezi?ski, M. Hüllermeier, V. Last et al., Open challenges for data stream mining research, ACM SIGKDD explorations newsletter, vol.16, pp.1-10, 2014.
DOI : 10.1145/2674026.2674028

J. Gama, A survey on learning from data streams: current and future trends, Progress in AI, vol.1, issue.1, pp.45-55, 2012.

J. Gama, Knowledge Discovery from Data Streams. Chapman and Hall / CRC Data Mining and Knowledge Discovery Series, 2010.
DOI : 10.1201/ebk1439826119

G. Cormode, N. Minos, P. J. Garofalakis, C. Haas, and . Jermaine, Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends in Databases, vol.4, pp.1-294, 2012.
DOI : 10.1561/1900000004

A. Forestiero, C. Pizzuti, and G. Spezzano, A single pass algorithm for clustering evolving data streams based on swarm intelligence, Data Min. Knowl. Discov, vol.26, issue.1, pp.1-26, 2013.

W. M. Rand, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, vol.66, issue.336, pp.846-850, 1971.