T. Segmentation and .. Short-text-alignmentdouglas-adams, The Hitchhiker's Guide to the Galaxy Contents 4.1 Segmentation, p.53

M. Short and T. , 54 4.2.1 Sentence Alignment, p.62

. Jones, maxt(tf) is the maximum frequency of any term in the document and avg.dl is the average document length with respect to the number of terms. For ease of reference, we also include the BM25 tf scheme. The k 1 and b parameters of BM25 are set to their default values of 1.2 and 0.95 respectively, SMART notation for term frequency variants, p.28, 2000.

K. Abdalgader and A. Skabar, Short-Text Similarity Measurement Using Word Sense Disambiguation and Synonym Expansion, Lecture Notes in Computer Science, vol.2, issue.2, pp.435-444, 2010.
DOI : 10.1162/coli.2006.32.1.13
URL : http://arrow.latrobe.edu.au:8080/http:/www.springerlink.com/content/31541v00731x2755/fulltext.pdf : Springer-Verlag,

J. Allwood, Multimodal corpora, Corpus Linguistics. An International Handbook, pp.207-225, 2008.
URL : https://hal.archives-ouvertes.fr/hprints-00511882

E. Amigó, J. Gonzalo, and J. Artiles, A comparison of extrinsic clustering evaluation metrics based on formal constraints technique, Information Retrieval, pp.261-286, 2009.

E. Amigó, J. Gonzalo, J. Artiles, and F. Verdejo, A comparison of extrinsic clustering evaluation metrics based on formal constraints, Information Retrieval, vol.30, issue.4, pp.461-486, 2009.
DOI : 10.1007/s10791-008-9066-8

M. Artstein and . Poesio, Inter-Coder Agreement for Computational Linguistics, Computational Linguistics, vol.27, issue.1, pp.555-596, 2008.
DOI : 10.1037/0033-2909.103.3.374

R. Baeza-yates and B. Ribeiro-neto, Modern information retrieval, 1999.

D. Bär, T. Zesch, and I. Gurevych, A reflective view on text similarity, Proceedings of Recent Advances in Natural Language Processing, pp.515-520, 2011.

A. Barron-cedeno, A. Eiselt, and P. Rosso, Monolingual text similarity measures: A comparison of models over wikipedia articles revisions, Proceedings of the 7th International Conference on Natural Language Processing, pp.29-38, 2009.

A. Barrón-cedeño, M. Potthast, P. Rosso, B. Stein, and A. E. Odijk, Corpus and Evaluation Measures for Automatic Plagiarism Detection, Proceedings of the Seventh conference on International Language Resources and Evaluation, 2010.

R. Barzilay and N. Elhadad, Sentence alignment for monolingual comparable corpora, Proceedings of the 2003 conference on Empirical methods in natural language processing -, pp.25-32, 2003.
DOI : 10.3115/1119355.1119359

R. Barzilay and K. R. Mckeown, Extracting paraphrases from a parallel corpus, Proceedings of the 39th Annual Meeting on Association for Computational Linguistics , ACL '01, pp.50-57, 2001.
DOI : 10.3115/1073012.1073020

R. Barzilay, K. Mckeown, and M. Elhadad, Information fusion in the context of multi-document summarization, Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics -, 1999.
DOI : 10.3115/1034678.1034760

J. Becker and D. Kuropka, Topic-based vector space model, Proceedings of the 6th International Conference on Business Information Systems, pp.7-12, 2003.

D. Beeferman, A. L. Berger, and J. D. Lafferty, Statistical models for text segmentation, Machine Learning, pp.177-210, 1999.

S. Bernardini, Monolingual comparable corpora and parallel corpora in the search for features of translated language, SYNAPS -A Journal of Professional Communication, 2011.

M. W. Berry, T. Susan, and . Dumais, Using Linear Algebra for Intelligent Information Retrieval, SIAM Review, vol.37, issue.4, pp.573-595, 1995.
DOI : 10.1137/1037127

J. Bezdek, R. Ehrlich, and W. Full, FCM: The fuzzy c-means clustering algorithm, Computers & Geosciences, vol.10, issue.2-3, pp.191-203, 1984.
DOI : 10.1016/0098-3004(84)90020-7

M. David, P. J. Blei, and . Moreno, Topic segmentation with an aspect hidden markov model, pp.343-348, 2001.

M. David, A. Y. Blei, M. I. Ng, and . Jordan, Latent dirichlet allocation, Journal of Machine Learning Research, vol.3, pp.993-1022, 2003.

E. Gaussier and B. Li, Improving corpus comparability for bilingual lexicon extraction from comparable corpora, Proceedings of 23rd international conference on computational linguistics, pp.644-652, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00953833

G. Boleda, S. Bott, R. Meza, C. Castillo, T. Badia et al., CUCWeb, Proceedings of the 2nd International Workshop on Web as Corpus, WAC '06, pp.19-28, 2006.
DOI : 10.3115/1628297.1628301

J. Broglio, J. P. Callan, W. B. Croft, and D. W. Nachbar, Document retrieval and routing using the inquery system, Proceeding of Third Text Retrieval Conference, pp.29-38, 1994.

C. Buckley, The importance of proper weighting methods Workshop on Human Language Technology, pp.349-352, 1993.

M. Caillet, J. Pessiot, M. Amini, and P. Gallinari, Unsupervised learning with term clustering for thematic segmentation of texts, RIAO, pp.648-657, 2004.

M. F. Caropreso, M. Fernandacnandad, S. Matwin, and F. Sebastiani, A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization, Text Databases and Document Management: Theory and Practice, 2001.

Y. Chiao and P. Zweigenbaum, Looking for candidate translational equivalents in specialized, comparable corpora, Proceedings of the 19th international conference on Computational linguistics -, 2002.
DOI : 10.3115/1071884.1071904

L. Rudi, P. M. Chilibrasi, and . Vitanyi, The google similarity distance, IEEE Transactions on Knowledge and Data Engineering, pp.370-383, 2007.

Y. Y. Freddy and . Choi, Advances in domain independent linear text segmentation, ANLP, pp.26-33, 2000.

Y. Y. Freddy, P. Choi, J. Wiemer-hastings, and . Moore, Latent semantic analysis for text segmentation, Proceedings of Empirical Methods in Natural Language Processing, pp.109-117, 2001.

V. Claveau, Vectorisation, okapi et calcul de similarité pour le tal : dpour oublier enfin le tf-idf, Proceedings of JEP-TALN-RECITAL, pp.85-98

W. W. Cohen, Learning trees and rules with set-valued features, pp.709-716, 1996.

M. Collins and Y. Singer, Unsupervised models for named entity classification, Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp.100-110, 1999.

R. Dale, Classical approaches to natural language processing, Handbook of Natural Language Processing, 2010.

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, Indexing by latent semantic analysis, Journal of the American Society for Information Science, vol.41, issue.6, pp.391-407, 1990.
DOI : 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9

B. Dolan, C. Quirk, and C. Brockett, Unsupervised construction of large paraphrase corpora, Proceedings of the 20th international conference on Computational Linguistics , COLING '04, pp.350-356, 2004.
DOI : 10.3115/1220355.1220406

D. Dubin, The most influential paper gerard salton never wrote, Library Trends, 2004.

S. Dumais, J. Platt, M. Sahami, and D. Heckerman, Inductive learning algorithms and representations for text categorization, Proceedings of the seventh international conference on Information and knowledge management , CIKM '98, pp.148-155, 1998.
DOI : 10.1145/288627.288651

S. T. Dumais, Improving the retrieval of information from external sources, Behavior Research Methods, Instruments, and Computers, pp.229-236, 1991.
DOI : 10.3758/BF03203370

O. Ferret, Finding document topics for improving topic segmentation, 2007.

B. William, R. A. Frakes, and . Baeza-yates, Information Retrieval: Data Structures & Algorithms, 1992.

N. Fuhr and C. Buckley, A probabilistic learning approach for document indexing, ACM Transactions on Information Systems, vol.9, issue.3, pp.223-248, 1991.
DOI : 10.1145/125187.125189

C. M. Benjamin, K. Fung, M. Wang, and . Ester, Hierarchical document clustering using frequent itemsets, Proceedings of SIAM International Conference on Data Mining, 2003.

P. Fung, A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-Parallel Corpora, AMTA, pp.1-17, 1998.
DOI : 10.1007/3-540-49478-2_1

P. Fung and K. W. Church, K-vec, Proceedings of the 15th conference on Computational linguistics -, pp.1096-1102, 1994.
DOI : 10.3115/991250.991328

R. Gaizauskas, J. Foster, Y. Wilks, J. Arundel, P. Clough et al., The meter corpus: A corpus for analysing journalistic text reuse, pp.214-223, 2001.

L. Galavotti, Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization, Proceedings of ECDL-00, 4th European Conference on Research and Advanced Technology for Digital Libraries, pp.59-68, 2000.
DOI : 10.1007/3-540-45268-0_6

A. William, K. W. Gale, and . Church, A program for aligning sentences in bilingual corpora, pp.1-8, 1991.

M. Galley, K. Mckeown, E. Fosler-lussier, and H. Jing, Discourse segmentation of multi-party conversation, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics , ACL '03, pp.562-569, 2003.
DOI : 10.3115/1075096.1075167

P. Gärdenfors, Conceptual spaces, Kognitionswissenschaft, vol.4, issue.4, 2000.
DOI : 10.1007/s001970050015

M. Thelwall and G. Paltoglou, A study of information retrieval weighting schemes for sentiment analysis, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010.

N. Goodman, Seven strictures on similarity. Bobbs-Merrill, 1991.

S. Granger, Comparable and translation corpora in cross-linguistic research. design, analysis and applications, Journal of Shanghai Jiaotong University, 2010.

B. J. Grosz and C. L. Sidner, Attention, intentions, and the structure of discourse, Computational Linguistics, vol.12, pp.175-204, 1986.

C. Guinaudeau, G. Gravier, and P. Sébillot, Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation, Journal of Computer Speech and Language, pp.90-104, 2012.
DOI : 10.1016/j.csl.2011.06.002
URL : https://hal.archives-ouvertes.fr/hal-00645705

W. Guo and M. Diab, Modeling sentences in the latent space, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp.864-872, 2012.

M. A. Halliday and R. Hasan, Cohesion in English. Longman Group Limited, pp.14-47, 1976.

M. A. , K. Halliday, and R. Hasan, Cohesion in English, 1976.

W. Khun and . Harold, The Hungarian method for the assignment problem, Naval Research Logistics Quarterly, vol.3, issue.1-2, pp.83-97, 1955.
DOI : 10.1002/nav.3800020109

V. Hatzivassiloglou, J. L. Klavans, and E. Eskin, Detecting text similarity over short passages: Exploring linguistic feature combinations via machine learning, pp.13-20, 1999.

V. Hatzivassiloglou, J. L. Klavans, M. L. Holcombe, R. Barzilay, M. Yen-kan et al., Simfinder: A flexible clustering tool for summarization, Proceedings of the North American Chapter of the Association for Computational Linguistics: Workshop on Automatic Summarization, pp.41-49, 2001.

A. Marti and . Hearst, Texttiling: Segmenting text into multi-paragraph subtopic passages, Computational Linguistics, vol.23, pp.33-64, 1997.

S. Hewavitharana and S. Vogel, Enhancing a statistical machine translation system by using automatically extracted parallel corpus from comparable sources, Proceedings of the LREC 2008 Workshop on Comparable Corpora, 2008.

M. Hoey, Patterns of Lexis in Text, 1991.

T. Hofmann, Probabilistic latent semantic analysis, UAI, pp.289-296, 1999.

T. Honkela, A. Hyvärinen, and J. J. Väyrynen, WordICA???emergence of linguistic representations for words by independent component analysis, Natural Language Engineering, vol.16, issue.03, pp.277-308, 2010.
DOI : 10.1037/0033-295X.114.1.1

A. Huang, Similarity measures for text document clustering, New Zealand Computer Science Research Student Conference, pp.49-56, 2008.

L. Hubert and P. Arabie, Comparing partitions, Journal of Classification, vol.78, issue.1, pp.193-218, 1985.
DOI : 10.1007/BF01908075

J. Hutchins, Machine translation: a concise history, 2007.

A. Hyvärinen, Survey on independent component analysis, 1999.

A. Islam and D. Inkpen, Semantic text similarity using corpus-based word similarity and string similarity, ACM Transactions on Knowledge Discovery from Data, vol.2, issue.2, pp.55-60, 2008.
DOI : 10.1145/1376815.1376819

A. Islam, D. Inkpen, and I. Kiringa, Applications of corpusbased semantic similarity and word segmentation to database schema matching. The VLDB Journal -The International Journal on Very Large Data Bases, pp.1293-1320, 2008.

B. Jagan, T. Geetha, and R. Parthasarathi, Two-stage bootstrapping for anaphora resolution, Proceedings of Conference on Computational Linguistics: Posters, pp.507-516, 2012.

H. Ji, Mining name translations from comparable corpora by creating bilingual information networks, Proceedings of the 2nd Workshop on Building and Using Comparable Corpora from Parallel to Non-parallel Corpora, BUCC '09, pp.34-37, 2009.
DOI : 10.3115/1690339.1690349

J. J. Jiang and D. W. Conrath, Semantic similarity based on corpus statistics and lexical taxonomy, Proceedings of the International Conference on Research in Computational Linguistics, pp.19-33, 1997.

I. T. Jolliffe, Principal Component Analysis, 2002.
DOI : 10.1007/978-1-4757-1904-8

K. Spärck and J. , A STATISTICAL INTERPRETATION OF TERM SPECIFICITY AND ITS APPLICATION IN RETRIEVAL, Journal of Documentation, vol.28, issue.1, pp.11-21, 1972.
DOI : 10.1108/eb026526

L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, 1990.
DOI : 10.1002/9780470316801

S. Kaufmann, Second-Order Cohesion, Computational Intelligence, vol.16, issue.4, pp.511-524, 2000.
DOI : 10.1111/0824-7935.00124

M. Kay, Text-translation alignment, pp.121-142, 1991.

A. Kilgarriff and G. Grefenstette, Introduction to the Special Issue on the Web as Corpus, Association for Computational Linguistics, 2003.
DOI : 10.1038/21987

S. Koeva, I. Stoyanova, R. Dekova, B. Rizov, and A. Genov, Bulgarian x-language parallel corpus, pp.23-25

T. Korenius, J. Laurikkala, and M. Juhola, On principal component analysis, cosine and Euclidean measures in information retrieval, Information Sciences, vol.177, issue.22, pp.4893-4905, 2007.
DOI : 10.1016/j.ins.2007.05.027

S. Lamprier, T. Amghar, B. Levrat, and F. Saubion, Using an evolving thematic clustering in a text segmentation process, 2008.

K. Thomas, S. T. Landauer, and . Dumais, A solution to plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge, Psychological Review, pp.211-240, 1997.

K. Thomas, P. W. Landauer, D. Foltz, and . Laham, Introduction to latent semantic analysis, Discourse Processes, pp.33-34, 1998.

C. Leacock and M. Chodorow, Combining local context and wordnet sense similarity for word sense identification. WordNet, An Electronic Lexical Database, pp.265-284, 1998.

M. Lesk, Automatic sense disambiguation using machine readable dictionaries, Proceedings of the 5th annual international conference on Systems documentation , SIGDOC '86, pp.24-26, 1986.
DOI : 10.1145/318723.318728

D. D. Lewis, Evaluating and optmizing autonomous text classification systems, Proceedings of SIGIR-95, 18th ACM International Conference on Research and Development in Information Retrieval, 1995.

J. Lewis, S. Ossowski, J. Hicks, M. Errami, and H. R. Garner, Text similarity: an alternative way to search MEDLINE, Bioinformatics, vol.22, issue.18, pp.2298-304, 2006.
DOI : 10.1093/bioinformatics/btl388

D. Lin, An information-theoretic definition of similarity, ICML, pp.296-304, 1998.

D. Lin, An information-theoretic definition of similarity, Proceedings of the 15th International Conference on Machine Learning, pp.296-304, 1998.

A. Lopez, Statistical machine translation, ACM Computing Surveys, vol.40, issue.3, 2008.
DOI : 10.1145/1380584.1380586

M. Hatmi, C. Jacquin, E. Morin, and S. Meignier, Incorporating named entity recognition into the speech transcription process, Interspeech, p.2013
URL : https://hal.archives-ouvertes.fr/hal-00843211

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng et al., Learning word vectors for sentiment analysis, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.142-150, 2011.

P. Makagonov, M. Alexandrov, and A. Gelbukh, Clustering Abstracts Instead of Full Texts, Proceedings of the 7th International Conference on Text, Speech, Dialog (TSD), Lecture notes in Artificial Intelligence, pp.129-135, 2004.
DOI : 10.1007/978-3-540-30120-2_17

D. Christopher, H. Manning, and . Schütze, Foundations of Statistical Natural Language Processing, pp.42-52, 1999.

C. D. Manning, P. Raghava, and H. Schütze, Introduction to Information Retrieval, pp.28-29, 2008.
DOI : 10.1017/CBO9780511809071

E. Marsi and E. Krahmer, Annotating a parallel monolingual treebank with semantic similarity relations, The Sixth International Workshop on Treebanks and Linguistic Theories, 2007.

T. Mark and . Maybury, Discourse cues for broadcast news segmentation, Conference on Computational Linguistics-Association for Computational Linguistics, pp.819-822, 1998.

M. Meila, Comparing clusterings???an information based distance, Journal of Multivariate Analysis, vol.98, issue.5, pp.873-895, 2007.
DOI : 10.1016/j.jmva.2006.11.013

I. Mel?uk, Meaning-Text Models: A Recent Trend in Soviet Linguistics, Annual Review of Anthropology, vol.10, issue.1, pp.27-62, 1981.
DOI : 10.1146/annurev.an.10.100181.000331

R. Mihalcea and C. Corley, Corpus-based and knowledge-based measures of text semantic similarity, AAAI'06, pp.775-780, 2006.

A. Mikheev, The ltg part of speech tagger, pp.50-57, 1997.

J. Milicevic, A short guide to the meaning-text linguistic theory, In Journal of Koralex, pp.187-233, 2006.

H. Mochizuki, T. Honda, and M. Okumura, Text Segmentation with Multiple Surface Linguistic Cues, Proceedings of Conference on Computational Linguistics-Association for Computational Linguistics, pp.881-885, 1998.
DOI : 10.5715/jnlp.6.3_43

D. Stefan-munteanu, A. Fraser, and D. Marcu, Improved machine translation performance via parallel sentence extraction from comparable corpora, Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.265-272, 2004.

P. Nakov, A. Popova, and P. Mateev, Weight functions impact on lsa performance, EuroConference on Recent Advances in Natural Language Processing, pp.187-193, 2001.

R. Nelken and S. M. Shieber, Towards robust context-sensitive sentence alignment for monolingual corpora, European Chapter of the Association for Computational Linguistics, pp.39-56, 2006.

Y. Andrew, M. I. Ng, Y. Jordan, and . Weiss, On spectral clustering: Analysis and an algorithm, Advances in Neural Information Processing Systems, pp.849-856, 2001.

W. B. Hwee-tou-ng, K. Goh, and . Low, Feature selectiouin, perceptron learning, and a usability case study for text categorization, Proceedings of SIGIR-07, 20th ACM International Conference on Research and Development in Information Retrieval, pp.67-73, 1997.

R. Nock and F. Nielsen, On weighting clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, issue.8, 2006.
DOI : 10.1109/TPAMI.2006.168

R. J. Passonneau and D. J. Litman, Intention-based segmentation, Proceedings of the 31st annual meeting on Association for Computational Linguistics -, pp.148-155, 1993.
DOI : 10.3115/981574.981594

J. Pearson, Terms in Context -Studies in Corpus Linguistics, John Benjamins, 1998.

J. Pino and M. Eskenazi, An application of latent semantic analysis to word sense discrimination for words with related and unrelated meanings, Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications, EdAppsNLP '09, pp.43-46, 2009.
DOI : 10.3115/1609843.1609849

D. Pinto and P. Rosso, Kncr: A short-text narrow-domain subcorpus of medline, TLH 2006. Advances in Computer Science, pp.266-269, 2006.

D. Pinto, H. Jiménez-salazar, and P. Rosso, Clustering Abstracts of Scientific Texts Using the Transition Point Technique, International Conference on Intelligent Text Processing and Computational Linguistics, pp.536-546, 2006.
DOI : 10.1007/11671299_55

D. Pinto, J. Benedí, and P. Rosso, Clustering narrowdomain short texts by using the kullback-leibler distance, International Conference on Intelligent Text Processing and Computational Linguistics, pp.26-64, 2007.

J. and R. Quinlan, C4.5: Programs for Machine Learning, 1993.

M. William and . Rand, Objective criteria for the evaluation of clustering methods, In Journal of the American Statistical Association, vol.66, issue.336, pp.846-850, 1971.

M. Recasens, M. De-marneffe, and C. Potts, The life and death of discourse entities: Identifying singleton mentions, Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.627-633, 2013.

R. Reichart and A. Rappopor, The NVI clustering evaluation measure, Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL '09, pp.165-173, 2009.
DOI : 10.3115/1596374.1596401

P. Resnik, Using information content to evaluate semantic similarity, Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp.448-453, 1995.

C. Jeffrey and . Reynar, An automatic method of finding topic boundaries, Association for Computational Linguistics, pp.331-333, 1994.

C. Jeffrey and . Reynar, Topic segmentation: Algorithms and applications, 1998.

S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-beaulieu, and M. Gatford, Okapi at trec-3, pp.109-126, 1996.

A. Rosenberg and J. Hirschberg, Comparing clusterings -an information based distance, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp.410-420, 2007.

G. Salton, MATHEMATICS AND INFORMATION RETRIEVAL, Journal of Documentation, vol.35, issue.1, pp.1-29, 1979.
DOI : 10.1108/eb026671

G. Salton and C. Buckley, Term-weighting approaches in automatic text retrieval, Information Processing and Management, pp.513-523, 1988.
DOI : 10.1016/0306-4573(88)90021-0

G. Salton and M. J. Mcgill, Introduction to modern information retrieval, 1986.

G. Salton, A. Wong, and C. S. Yang, A vector space model for automatic indexing, Communications of the ACM, vol.18, issue.11, pp.613-620, 1975.
DOI : 10.1145/361219.361220

I. Santos, C. Laorden, B. Sanz, and P. G. Bringas, Enhanced Topic-based Vector Space Model for semantics-aware spam filtering, Proceedings of Expert Systems With Applications, pp.437-444
DOI : 10.1016/j.eswa.2011.07.034

H. Schütze, D. A. Hull, and J. O. Pedersen, A comparison of classifiers and document representations for the routing problem, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '95, pp.229-237, 1995.
DOI : 10.1145/215206.215365

F. Sebastiani, Machine learning in automated text categorization, ACM Computing Surveys, vol.34, issue.1, pp.1-47, 2002.
DOI : 10.1145/505282.505283

J. Sinclair, Eagles preliminary recommendations on corpus typology, 1996.

A. Singhal, C. Buckley, and M. Mitra, Pivoted document length normalization, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '96, pp.21-29, 1996.
DOI : 10.1145/243199.243206

H. Spath, Cluster Dissection and Analysis: Theory, Fortran Programs, Examples, 1985.

N. Stokes, J. Carthy, and A. F. Smeaton, Segmenting broadcast news streams using lexical chains, Proceedings of 1st Starting AI Researchers Symposium, pp.145-154, 2002.

G. Tsatsaronis and V. Panagiotopoulou, A generalized vector space model for text retrieval based on semantic relatedness, Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop on, EACL '09, pp.70-78, 2009.
DOI : 10.3115/1609179.1609188

D. Peter and . Turney, Mining the web for synonyms: Pmi-ir versus lsa on toefl, Proceedings of Twelfth European Conference on Machine Learning, pp.491-502, 2001.

A. Tversky, Features of similarity, Psychological Review, pp.327-352, 1977.

M. Utiyama and H. Isahara, A statistical model for domainindependent text segmentation, Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2001.

J. Véronis and P. Langlais, Evaluation of parallel text alignment systems, pp.369-388, 2000.
DOI : 10.1007/978-94-017-2535-4_19

O. Vikas, A. K. Meshram, G. Meena, and A. Gupta, Multiple document summarization using principal component analysis incorporating semantic vector space model. Associtaion for Computational Linguistics and Chinese Language Processing, pp.141-156, 2008.

L. Ulrike-von, A tutorial on spectral clustering, Statistics and Computing, vol.17, pp.395-416, 2007.

W. Wang, K. Thadani, and K. R. Mckeown, Identifying event descriptions using co-training with online news summaries, Proceedings of the 5th International Joint Conference on Natural Language Processing, p.2011, 2011.

D. Widdows, Geometry and meaning, Center for the Study of Language and Information, 2004.

E. Wiener, J. O. Pedersen, and A. S. Weigend, A neural network approach to topic spotting, Proceedings of SDAIR-95, 4th Annual Symposium on Document Analysis and Information Retrieval, 1995.

F. Wild, C. Stahl, G. Stermsek, and G. Neumann, Parameters driving effectiveness of automated essay scoring with lsa, Proceedings of the 9th CAA, 2005.

S. K. Wong, W. Ziarko, V. V. Raghavan, and P. C. Wong, On modeling of information retrieval concepts in vector spaces, ACM Transactions on Database Systems, vol.12, issue.2, pp.299-321, 1987.
DOI : 10.1145/22952.22957

Z. Wu and M. Palmer, Verbs semantics and lexical selection, Proceedings of the 32nd annual meeting on Association for Computational Linguistics -, pp.133-138, 1994.
DOI : 10.3115/981732.981751

Y. Yaari, Segmentation of expository texts by hierarchical agglomerative clustering. CoRR, 1997.

J. P. Yamron, I. Carp, L. Gillick, S. Lowe, and P. Van-mulbregt, A hidden Markov model approach to text segmentation and event tracking, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), pp.333-336, 1998.
DOI : 10.1109/ICASSP.1998.674435

N. Ye, J. Zhu, H. Wang, M. Y. Ma, and B. Zhang, An improved model of dotplotting for text segmentation, Journal of Chinese Language and Computing, pp.27-40, 2006.

G. Kingsley and Z. , Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology, 1949.