X. Wang, J. Ah-pine, and J. Darmont, Shcoclust, a scalable similaritybased hierarchical co-clustering method and its application to textual collections
URL : https://hal.archives-ouvertes.fr/hal-01504986

, IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 17, 2017.

J. Ah-pine and X. Wang, Similarity based hierarchical clustering with an application to text collections, International Symposium on Intelligent Data Analysis, p.320331, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01437124

X. Wang, J. Ah-pine, and J. Darmont, A new test of cluster hypothesis using a scalable similarity-based agglomerative hierarchical clustering framework, Rencontres Jeunes Chercheurs en Recherche dInformation, vol.17, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01504961

J. Ah-pine and X. Wang, Classification ascendante hiérarchique à noyaux et pistes pour un meilleur passage à léchelle, Journées de Statistique de la SFDS, 2015.

J. Ah-pine and X. Wang, Classification ascendante hiérarchique à noyaux et une application aux données textuelles, EGC, volume vol, 2017.

J. Dean and S. Ghemawat, Mapreduce: simplified data processing on large clusters, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.

G. Gan, C. Ma, and J. Wu, Data clustering: theory, algorithms, and applications. SIAM, 2007.

Y. Jeon and S. Yoon, Multi-threaded hierarchical clustering by parallel nearest-neighbor chaining, IEEE Transactions on Parallel and Distributed Systems, vol.26, issue.9, pp.2534-2548, 2015.

M. Michel, E. Deza, and . Deza, Encyclopedia of distances, Encyclopedia of Distances, pp.1-583, 2009.

D. Müllner, Modern hierarchical, agglomerative clustering algorithms, 2011.

T. Zhang, R. Ramakrishnan, and M. Livny, Birch: an efficient data clustering method for very large databases, ACM Sigmod Record, vol.25, pp.103-114, 1996.

J. Ah-pine and X. Wang, Similarity based hierarchical clustering with an application to text collections, International Symposium on Intelligent Data Analysis, pp.320-331, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01437124

X. Wang, J. Ah-pine, and J. Darmont, Shcoclust, a scalable similarity-based hierarchical co-clustering method and its application to textual collections, IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 17, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01504986

S. Inderjit and . Dhillon, Co-clustering documents and words using bipartite spectral graph partitioning, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp.269-274, 2001.

C. Charu, C. Aggarwal, and . Zhai, A survey of text clustering algorithms, Mining text data, pp.77-128, 2012.

. Douglass-r-cutting, R. David, J. O. Karger, J. W. Pedersen, and . Tukey, Scatter/gather: A cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pp.318-329, 1992.

F. Beil, M. Ester, and X. Xu, Frequent term-based text clustering, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.436-442, 2002.

O. Zamir and O. Etzioni, Web document clustering: A feasibility demonstration, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp.46-54, 1998.

T. Hofmann, Probabilistic latent semantic indexing, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp.50-57, 1999.

M. David, . Blei, Y. Andrew, and M. Ng, Latent dirichlet allocation, Journal of machine Learning research, vol.3, pp.993-1022, 2003.

C. Charu, . Aggarwal, C. Stephen, P. Gates, and . Yu, On using partial supervision for text categorization, IEEE Transactions on Knowledge and data Engineering, vol.16, issue.2, pp.245-255, 2004.

A. Blum and . Mitchell, Learning to classify text from labeled and unlabeled documents, Conference on Computational Learning Theory, 1998.

X. Ji and W. Xu, Document clustering with prior knowledge, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp.405-412, 2006.

R. Bekkerman, M. Bilenko, and J. Langford, Scaling up machine learning: Parallel and distributed approaches, 2011.

J. Mohammed, C. Zaki, and . Ho, Large-scale parallel data mining, 2000.

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma et al., Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp.2-2, 2012.

G. Ingersoll, Introducing apache mahout. Scalable, commercial-friendly machine learning for building intelligent applications, 2009.

W. Gropp, Tutorial on mpi: The message-passing interface, p.60439, 2009.

S. Andrew, M. Tanenbaum, and . Van-steen, Distributed systems, 2007.

N. Godfrey, W. T. Lance, and . Williams, A general theory of classificatory sorting strategies: Ii. clustering systems, The computer journal, vol.10, issue.3, pp.271-277, 1967.

F. Murtagh and P. Contreras, Algorithms for hierarchical clustering: an overview, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol.2, pp.86-97, 2012.

R. Sibson, Slink: an optimally efficient algorithm for the single-link cluster method, The computer journal, vol.16, issue.1, pp.30-34, 1973.

. James-rohlf, Hierarchical clustering using minimum spanning tree, 1973.

D. Defays, An efficient algorithm for a complete link method, The Computer Journal, vol.20, issue.4, pp.364-366, 1977.

. C-de-rham, La classification hiérarchique ascendante selon la méthode des voisins réciproques. Les cahiers de l'analyse des données, vol.5, pp.135-144, 1980.

J. Juan, Programme de classification hiérarchique par l'algorithme de la recherche en chaîne des voisins réciproques. Les cahiers de l'analyse des données, vol.7, pp.219-225, 1982.

F. Murtagh, A survey of recent advances in hierarchical clustering algorithms, The Computer Journal, vol.26, issue.4, pp.354-359, 1983.

M. Bruynooghe, Méthodes nouvelles en classification automatique de données taxinomiques nombreuses, vol.2, pp.24-42, 1977.

A. Michael-rex, Office of the Assistant for Study Support, 1972.

J. Roberto, D. López-sastre, P. Oñoro-rubio, S. Gil-jiménez, and . Maldonado-bascón, Fast reciprocal nearest neighbors clustering, Signal Processing, vol.92, issue.1, pp.270-275, 2012.

K. Bastian-leibe, B. Mikolajczyk, and . Schiele, Efficient clustering and matching for object class recognition, BMVC, pp.789-798, 2006.

S. Guha, R. Rastogi, and K. Shim, Cure: an efficient clustering algorithm for large databases, ACM SIGMOD Record, vol.27, pp.73-84

Y. Loewenstein, E. Portugaly, M. Fromer, and M. Linial, Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space, Bioinformatics, vol.24, issue.13, pp.41-49, 2008.

Y. Sun, Y. Cai, L. Liu, F. Yu, L. Michael et al., Esprit: estimating species richness using large collections of 16s rrna pyrosequences, Nucleic acids research, vol.37, issue.10, pp.76-76, 2009.

T. Nguyen, B. Schmidt, and C. Kwoh, Sparsehc: a memoryefficient online hierarchical clustering algorithm, Procedia Computer Science, vol.29, pp.8-19, 2014.

S. Shalom, M. Dash, and M. Tue, An approach for fast hierarchical agglomerative clustering using graphics processors with cuda, Advances in Knowledge Discovery and Data Mining, pp.35-42, 2010.

Z. Du and F. Lin, A novel parallelization approach for hierarchical clustering, Parallel Computing, vol.31, issue.5, pp.523-527, 2005.

W. Hendrix, M. Patwary, A. Agrawal, W. Liao, and A. Choudhary, Parallel hierarchical clustering on shared memory platforms

, High Performance Computing (HiPC), 2012 19th International Conference on, pp.1-9, 2012.

S. Wang and H. Dutta, Parable: A parallel random-partition based hierarchical clustering algorithm for the mapreduce framework, 2011.

C. Jin, Z. Chen, and W. Hendrix, Ankit Agrawal, and Alok Choudhary. Incremental, distributed single-linkage hierarchical clustering algorithm using mapreduce, Proceedings of the Symposium on High Performance Computing, pp.83-92, 2015.

C. Jin, R. Liu, Z. Chen, and W. Hendrix, Ankit Agrawal, and Alok Choudhary. A scalable hierarchical clustering algorithm using spark, Big Data Computing Service and Applications (BigDataService), pp.418-426, 2015.

G. Govaert and M. Nadif, Co-clustering, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00933301

S. Inderjit-s-dhillon, D. Mallela, and . Modha, Informationtheoretic co-clustering, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.89-98, 2003.

C. Sara, A. Madeira, and . Oliveira, Biclustering algorithms for biological data analysis: a survey, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), vol.1, issue.1, pp.24-45, 2004.

A. Tanay, R. Sharan, and R. Shamir, Biclustering algorithms: A survey. Handbook of computational molecular biology, vol.9, pp.122-124, 2005.

G. Xu, Y. Zong, P. Dolog, and Y. Zhang, Co-clustering analysis of weblogs using bipartite spectral projection approach. Knowledge-Based and Intelligent Information and Engineering Systems, pp.398-407, 2010.

T. George and S. Merugu, A scalable collaborative filtering framework based on co-clustering, Data Mining, Fifth IEEE international conference on, p.4, 2005.

A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. Modha, A generalized maximum entropy approach to bregman coclustering and matrix approximation, Journal of Machine Learning Research, vol.8, pp.1919-1986, 2007.

G. Govaert and M. Nadif, Clustering with block mixture models, vol.36, pp.463-473, 2003.

G. Govaert and M. Nadif, An em algorithm for the block mixture model, IEEE Transactions on Pattern Analysis and machine intelligence, vol.27, issue.4, pp.643-647, 2005.

G. Govaert and M. Nadif, Fuzzy clustering to estimate the parameters of block mixture models, Soft Computing-A Fusion of Foundations, Methodologies and Applications, vol.10, pp.415-422, 2006.

G. Govaert and M. Nadif, Clustering of contingency table and mixture model, European Journal of Operational Research, vol.183, issue.3, pp.1055-1066, 2007.

G. Govaert and M. Nadif, Block clustering with bernoulli mixture models: Comparison of different approaches, Computational Statistics & Data Analysis, vol.52, issue.6, pp.3233-3245, 2008.

G. Govaert and M. Nadif, Latent block model for contingency table, Communications in StatisticsTheory and Methods, vol.39, issue.3, pp.416-425, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00447792

G. Celeux, D. Chauveau, and J. Diebolt, Stochastic versions of the em algorithm: an experimental study in the mixture case, Journal of Statistical Computation and Simulation, vol.55, issue.4, pp.287-314, 1996.
URL : https://hal.archives-ouvertes.fr/hal-00693519

M. Charrad, Y. Lechevallier, M. B. Ahmed, and G. Saporta, On the number of clusters in block clustering algorithms, FLAIRS Conference, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01125839

U. Von and L. , A tutorial on spectral clustering, Statistics and computing, vol.17, issue.4, pp.395-416, 2007.

L. Hagen and A. B. Kahng, New spectral methods for ratio cut partitioning and clustering, IEEE transactions on computer-aided design of integrated circuits and systems, vol.11, issue.9, pp.1074-1085, 1992.

J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Transactions, vol.22, issue.8, pp.888-905, 2000.

D. Wagner and F. Wagner, Mathematical Foundations of Computer Science, pp.744-750, 1993.

H. Zha, X. He, C. Ding, H. Simon, and M. Gu, Bipartite graph partitioning and data clustering, Proceedings of the tenth international conference on Information and knowledge management, pp.25-32, 2001.

M. Rege, M. Dong, and F. Fotouhi, Bipartite isoperimetric graph partitioning for data co-clustering, Data Mining and Knowledge Discovery, vol.16, issue.3, pp.276-312, 2008.

B. Mohar, G. Alavi, . Chartrand, and . Oellermann, The laplacian spectrum of graphs, Graph theory, combinatorics, and applications, vol.2, p.12, 1991.

B. Mohar, Some applications of laplace eigenvalues of graphs, Graph symmetry, pp.225-275, 1997.

R. K. Fan and . Chung, Spectral graph theory, vol.92, 1997.

Y. Andrew, M. I. Ng, Y. Jordan, and . Weiss, On spectral clustering: Analysis and an algorithm, Advances in neural information processing systems, pp.849-856, 2002.

M. Holmes, A. Gray, and C. Isbell, Fast svd for large-scale matrices, Workshop on Efficient Machine Learning at NIPS, vol.58, pp.249-252, 2007.

Z. Bai, J. Demmel, and J. Dongarra, Axel Ruhe, and Henk van der Vorst. Templates for the solution of algebraic eigenvalue problems: a practical guide

, SIAM, 2000.

S. Guattery, L. Gary, and . Miller, On the quality of spectral separators, SIAM Journal on Matrix Analysis and Applications, vol.19, issue.3, pp.701-719, 1998.

D. Daniel, H. Lee, and . Sebastian-seung, Algorithms for non-negative matrix factorization, Advances in neural information processing systems, pp.556-562, 2001.

D. Daniel, H. Lee, and . Sebastian-seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol.401, issue.6755, p.788, 1999.

T. Li, H. Q. Chris, and . Ding, Nonnegative matrix factorizations for clustering: A survey, 2013.

C. Ding, T. Li, W. Peng, and H. Park, Orthogonal nonnegative matrix t-factorizations for clustering, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.126-135, 2006.

H. Wang, F. Nie, H. Huang, and F. Makedon, Fast nonnegative matrix tri-factorization for large-scale data co-clustering, IJCAI ProceedingsInternational Joint Conference on Artificial Intelligence, vol.22, p.1553, 2011.

C. Ding, X. He, and H. Simon, On the equivalence of nonnegative matrix factorization and spectral clustering, Proceedings of the 2005 SIAM International Conference on Data Mining, pp.606-610, 2005.

S. Papadimitriou and J. Sun, Disco: Distributed co-clustering with mapreduce: A case study towards petabyte-scale end-to-end mining, Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on, pp.512-521, 2008.

T. Sarazin, M. Lebbah, and H. Azzag, Biclustering using sparkmapreduce, BigData Conference, pp.58-60, 2014.

S. Su, X. Cheng, L. Gao, and J. Yin, Co-clusterd: a distributed framework for data co-clustering with sequential updates, 2013 IEEE 13th International Conference on Data Mining, pp.1193-1198, 2013.

Y. Zhang, Q. Gao, L. Gao, and C. Wang, imapreduce: A distributed computing framework for iterative computation, Journal of Grid Computing, vol.10, issue.1, pp.47-68, 2012.

L. Tamine and L. Soulier, Collaborative information retrieval: Concepts, models and evaluation, Advances in Information Retrieval -38th European Conference on IR Research, pp.885-888, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01296314

L. Tamine and L. Soulier, Collaborative information retrieval: Frameworks, theoretical models, and emerging topics, Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval, pp.7-8, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01348916

A. Kulkarni and J. Callan, Selective search: Efficient and effective search of large textual collections, ACM Transactions on Information Systems (TOIS), vol.33, issue.4, p.17, 2015.

Y. Kim, J. Callan, S. Culpepper, and A. Moffat, Efficient distributed selective search, Information Retrieval Journal, vol.20, issue.3, pp.221-252, 2017.

G. Katz, A. Shtock, O. Kurland, B. Shapira, and L. Rokach,

, Wikipedia-based query performance prediction, Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pp.1235-1238, 2014.

H. Raviv, O. Kurland, and D. Carmel, Query performance prediction for entity retrieval, Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pp.1099-1102, 2014.

N. Jardine and C. Joost-van-rijsbergen, The use of hierarchic clustering in information retrieval. Information storage and retrieval, vol.7, pp.217-240, 1971.

M. Ellen and . Voorhees, The cluster hypothesis revisited, Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval, pp.188-196, 1985.

B. Croft, A model of cluster searching based on classification, Information systems, vol.5, issue.3, pp.189-195, 1980.

A. El, -. Hamdouchi, and P. Willett, Techniques for the measurement of clustering tendency in document retrieval systems, Journal of Information Science, vol.13, issue.6, pp.361-365, 1987.

C. Van-rijsbergen and W. Croft, Document clustering: An evaluation of some experiments with the cranfield 1400 collection, Information Processing & Management, vol.11, issue.5, pp.171-182, 1975.

O. Kurland, The cluster hypothesis in information retrieval, European Conference on Information Retrieval, pp.823-826, 2014.

A. Griffiths, L. A. Robinson, and P. Willett, Hierarchic agglomerative clustering methods for automatic document classification, Journal of Documentation, vol.40, issue.3, pp.175-205, 1984.

P. Willett, Recent trends in hierarchic document clustering: a critical review, Information Processing & Management, vol.24, issue.5, pp.577-597, 1988.

A. Griffiths, C. Luckhurst, and P. Willett, Using interdocument similarity information in document retrieval systems. Readings in Information Retrieval, pp.365-373, 1997.

D. Mark, J. Smucker, and . Allan, A new measure of the cluster hypothesis, Conference on the Theory of Information Retrieval, pp.281-288, 2009.

A. Tombros, The effectiveness of query-based hierarchic clustering of documents for information retrieval, 2002.

F. Raiber and O. Kurland, The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrieval, Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pp.1155-1158, 2014.

F. Raiber and O. Kurland, Exploring the cluster hypothesis, and clusterbased retrieval, over the web, Proceedings of the 21st ACM international conference on Information and knowledge management, pp.2507-2510, 2012.

C. Zhai, Statistical language models for information retrieval, Synthesis Lectures on Human Language Technologies, vol.1, issue.1, pp.1-141, 2008.

M. Bruynooghe, Classification ascendante hiérarchique des grands ensembles de données : un algorithme rapide fondé sur la construction des voisinages réductibles. Cahiers de l'analyse des données, vol.3, pp.7-33, 1978.

N. Cristianini and J. Shawe-taylor, An introduction to support vector machines and other kernel-based learning methods, 2000.

T. Joachims, Text categorization with support vector machines: Learning with many relevant features, European conference on machine learning, pp.137-142, 1998.

K. Lang, NewsWeeder : learning to filter netnews, Proceedings of the Twelfth International Conference on Machine Learning, pp.331-339, 1995.

L. Hubert and P. Arabie, Comparing partitions, Journal of classification, vol.2, issue.1, pp.193-218, 1985.

D. Aaron-f-mcdaid, N. Greene, and . Hurley, Normalized mutual information to evaluate overlapping community finding algorithms, 2011.

A. El, -. Hamdouchi, and P. Willett, Comparison of hierarchic agglomerative clustering methods for document retrieval, The Computer Journal, vol.32, issue.3, pp.220-227, 1989.

F. Martin and . Porter, An algorithm for suffix stripping, vol.14, pp.130-137, 1980.

A. Rajaraman and . Ullman, Finding similar items. Mining of Massive Datasets, vol.77, pp.73-80, 2010.

J. Wang, J. Heng-tao-shen, J. Song, and . Ji, Hashing for similarity search: A survey, 2014.

S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl, Spinning fast iterative data flows, Proceedings of the VLDB Endowment, vol.5, pp.1268-1279, 2012.

J. Ah-pine and X. Wang, Classification ascendante hiérarchique à noyaux et pistes pour un meilleur passage à l'échelle, Journées de Statistique de la SFDS, 2015.

J. Ah-pine and X. Wang, Classification ascendante hiérarchique à noyaux et une application aux données textuelles, EGC, volume vol, 2017.

X. Wang, J. Ah-pine, and J. Darmont, A new test of cluster hypothesis using a scalable similarity-based agglomerative hierarchical clustering framework, Rencontres Jeunes Chercheurs en Recherche d'Information, vol.17, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01504961