S. Abney, Corpus-Based Methods in Language and Speech, chapitre Part-of- Speech Tagging and Partial Parsing, p.118136, 1996.

G. Adda, J. Mariani, P. Paroubek, M. Rajman, and J. Et-lecomte, Métrique et premiers résultats de l'évaluation GRACE des étiqueteurs morphosyntaxiques pour le français, Actes de la 6ème conférence sur le Traitement Automatique des Langues Naturelles (TALN), 1999.

M. Adda-decker, De la reconnaissance automatique de la parole à l'analyse linguistique de corpus oraux, Actes des 26èmes Journées d'Études sur la Parole (JEP), 2006.

Y. Akita, Y. Nemoto, and T. Et-kawahara, PLSA-Based Topic Detection in Meetings for Adaptation of Lexicon and Language Model, Proc. of the 10th European Conference on Speech Communication and Technology (Eurospeech), 2007.

A. P. Et-boëffard and O. , Algorithme de recherche d'un rang de prédiction. Application à l'évaluation de modèles de langage, Actes des 26èmes Journées d'Études sur la Parole (JEP), 2006.

J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Et-yang, Topic Detection and Tracking Pilot Study Final Report, Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, 1998.

A. Allauzen, Modélisation linguistique pour l'indexation automatique de documents audiovisuels, Thèse de doctorat, 2003.

A. Allauzen and J. Et-gauvain, Adaptation automatique du modèle de langage d'un système de transcription de journaux parlés, Traitement Automatique des Langues (TAL), vol.44, issue.1, p.1131, 2003.

A. Allauzen and J. Et-gauvain, Open Vocabulary ASR for Audiovisual Document Indexation, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 2005.
DOI : 10.1109/ICASSP.2005.1415288

J. Allen, Synthesis of speech from unrestricted text, Proc. of the IEEE, p.433442, 1976.
DOI : 10.1109/PROC.1976.10152

J. B. Allen, How do Humans Process and Recognize Speech?, IEEE Transactions on Speech and Audio Processing, vol.2, issue.4, p.567577, 1994.

J. Antoine and D. Genthial, Méthodes hybrides issues du TALN et du TAL parlé : état des lieux et perspectives, Actes de la 6ème conférence sur le Traitement Automatique des Langues Naturelles (TALN), 1999.

J. Antoine and J. Et-goulian, Word Order Variations and Spoken Man-Machine Dialogue in French: a Corpus Analysis on the ATIS Domain, Proc. of Corpus Linguistics, 2001.

S. Armstrong, P. Bouillon, and G. Et-robert, Tools for Part-of-Speech Tagging, 1995.

A. Asadi, R. Schwartz, and J. Et-makhoul, Automatic Detection of New words in a Large Vocabulary Continuous Speech Recognition System, Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1990.

M. Bacchiani and B. Et-roark, Unsupervised language model adaptation, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., 2003.
DOI : 10.1109/ICASSP.2003.1198758

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.330.4450

M. Banko and E. Et-brill, Scaling to very very large corpora for natural language disambiguation, Proceedings of the 39th Annual Meeting on Association for Computational Linguistics , ACL '01, 2001.
DOI : 10.3115/1073012.1073017

URL : http://acl.ldc.upenn.edu/P/P01/P01-1005.pdf

L. E. Baum, T. Petrie, G. Soules, and N. Et-weiss, A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains, The Annals of Mathematical Statistics, vol.41, issue.1, p.164171, 1970.
DOI : 10.1214/aoms/1177697196

F. Béchet, A. Nasr, T. Spriet, and R. Et-de-mori, Modèles de langage à portée variable : application au traitement des homophones, Actes de la 6ème conférence sur le Traitement Automatique des Langues Naturelles (TALN), 1999.

D. Beeferman, A. Berger, and J. Et-lafferty, Statistical Models for Text Segmentation, Machine Learning, pp.1-3177210, 1999.

J. R. Bellegarda, A Multispan Language Modeling Framework for Large Vocabulary Speech Recognition, IEEE Transactions on Speech and Audio Processing, vol.6, issue.5, p.456467, 1998.

J. R. Bellegarda, Large vocabulary speech recognition with multispan statistical language models, IEEE Transactions on Speech and Audio Processing, vol.8, issue.1, p.7684, 2000.
DOI : 10.1109/89.817455

J. R. Bellegarda, Statistical language model adaptation: review and perspectives, Speech Communication, vol.42, issue.1, p.93108, 2004.
DOI : 10.1016/j.specom.2003.08.002

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.91.4893

C. Benzitoun, E. Campione, J. Deulofeu, S. Henry, F. Sabio et al., L'analyse syntaxique de l'oral : problèmes et méthode, et Véronis J, 2004.

A. Berger and R. Et-miller, Just-in-time language modelling, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), 1998.
DOI : 10.1109/ICASSP.1998.675362

A. Berger and H. Et-printz, Recognition Performance of a Large-Scale Dependency-Grammar Language Model, Proc. of the 5th International Conference on Spoken Language Processing (ICSLP), 1998.

Y. Bestgen, Improving Text Segmentation Using Latent Semantic Analysis: A Reanalysis of Choi, Computational Linguistics, vol.32, issue.1, p.512, 2001.

B. Bigi, D. Mori, R. El-bèze, M. Spriet, and T. , Detecting Topic Shift Using a Cache Memory, Proc. of the 5th International Conference on Spoken Language Processing (ICSLP), 1998.

B. Bigi, D. Mori, R. Spriet, and T. , Reconnaissance thématique à partir de textes dictés et adaptation dynamique de modèles de langage thématiques, Actes des 23èmes Journées d'Études sur la Parole (JEP), 2000.

B. Bigi, Y. Huang, and R. Et-de-mori, Vocabulary and Language Model Adaptation Using Information Retrieval, Proc. of the 8th International Conference on Spoken Language Processing (ICSLP), 2004.
URL : https://hal.archives-ouvertes.fr/hal-01392515

F. Bimbot, M. El-bèze, and M. Et-jardinot, An alternative scheme for perplexity estimation, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997.
DOI : 10.1109/ICASSP.1997.596230

URL : https://hal.archives-ouvertes.fr/inria-00100687

F. Bimbot, R. Pieraccini, E. Levin, and B. Et-atal, Variable-length sequence modeling: multigrams, IEEE Signal Processing Letters, vol.2, issue.6, p.111113, 1995.
DOI : 10.1109/97.388911

C. Blanche-benveniste, Le français parlé : études grammaticales, 1990.

D. M. Blei and P. J. Moreno, Topic segmentation with an aspect hidden Markov model, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '01, 2001.
DOI : 10.1145/383952.384021

D. M. Blei, A. Y. Ng, and M. I. Et-jordan, Latent Dirichlet Allocation, Journal of Machine Learning Research, vol.3, p.9931022, 2003.

J. Bouaud, B. Habert, A. Nazarenko, and P. Et-zweigenbaum, Regroupements issus de dépendances syntaxiques en corpus : catégorisation et confrontation à deux modélisations conceptuelles, Actes des journées francophones d'ingénierie des connaissances (IC), 1997.

N. Boufaden, G. Lapalme, and Y. Et-bengio, Segmentation en thèmes de conversations téléphoniques:traitement en amont pour l'extraction d'information, Actes de la 9ème conférence sur le Traitement Automatique des Langues Naturelles (TALN), 2002.

H. Bourlard and S. Et-dupont, A New ASR Approach Based on Independent Processing and Recombination of Partial Frequency Bands, Proc. of 4th International Conference on Spoken Language Processing (ICSLP), 1996.

T. Brants, TnT -A Statistical Part-of-Speech Tagger, Proc. of the 6th Conference on Applied Natural Language Processing (ANLP), 2000.

E. Brill, A Simple Rule-Based Part of Speech Tagger, Proc. of the 3rd Conference on Applied Natural Language Processing (ANLP), 1992.

E. Brill, Some Advances in Transformation-Based Part of Speech Tagging, 1994.

E. Brill, R. Florian, J. C. Henderson, and L. Et-mangu, Beyond N-Grams: Can Linguistic Sophistication Improve Language Modeling, Proc. of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL), 1998.

G. Brown and G. Et-yule, Discourse Analysis. Cambridge Textbooks in Linguistics Series, 1983.

A. Brun, Détection de thème et adaptation des modèles de langage pour la reconnaissance automatique de la parole, Thèse de doctorat, 2003.

I. Bulyko, M. Ostendorf, M. Siu, T. Ng, A. Stolcke et al., Web resources for language modeling in conversational speech recognition, ACM Transactions on Speech and Language Processing, vol.5, issue.1, p.125, 2007.
DOI : 10.1145/1322391.1322392

I. Bulyko, M. Ostendorf, and A. Et-stolcke, Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology companion volume of the Proceedings of HLT-NAACL 2003--short papers, NAACL '03, 2003.
DOI : 10.3115/1073483.1073486

M. Caillet, J. Pessiot, M. Amini, and P. Gallinari, Unsupervised Learning with Term Clustering for Thematic Segmentation of Texts, Proc. of recherche d'information assistée par ordinateur (RIAO), 2004.

E. Campione, Étiquetage prosodique semi-automatique de corpus oraux : algorithmes et méthodologie, Thèse de doctorat, 2001.

E. Campione and J. Et-véronis, Étude des relations entre pauses et ponctuations pour la synthèse de la parole à partir de texte, Actes de la 9ème conférence sur le Traitement Automatique des Langues Naturelles (TALN), 2002.

E. Campione and J. Et-véronis, Pauses et hésitations en français spontané, Actes des 25èmes Journées d'Études sur la Parole (JEP), 2004.

E. Campione, J. Véronis, and J. Et-deulofeu, C-ORAL-ROM, Integrated Reference Corpora for Spoken Romance Languages, chapitre 3. The French corpus, p.111133, 2005.

M. Candea, Contribution à l'étude des pauses silencieuses et des phénomènes dits d'«hésitation» en français oral spontané. Étude sur un corpus de récits en classe de français, Thèse de doctorat, 2000.

B. A. Carlson, Unsupervised topic clustering of switchboard speech messages, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, 1996.
DOI : 10.1109/ICASSP.1996.541095

K. Çarki, P. Geutner, and T. Schultz, Turkish LVCSR: towards better speech recognition for agglutinative languages, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), 2000.
DOI : 10.1109/ICASSP.2000.861971

E. Charniak, A Maximum-Entropy-Inspired Parser, Proc. of the Conference of the North American Chapter, 2000.

E. Charniak, Immediate-head parsing for language models, Proceedings of the 39th Annual Meeting on Association for Computational Linguistics , ACL '01, 2001.
DOI : 10.3115/1073012.1073029

C. Chelba and F. Et-jelinek, Structured language modeling, Computer Speech & Language, vol.14, issue.4, p.283332, 2000.
DOI : 10.1006/csla.2000.0147

L. Chen, J. Gauvain, L. Lamel, and G. Et-adda, Unsupervised Language Model Adaptation for Broadcast News, Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003.

L. Chen, J. Gauvain, L. Lamel, and G. Et-adda, Dynamic Language Modeling for Broadcast News, Proc. of the 8th International Conference on Spoken Language Processing (ICSLP), 2004.

L. Chen, J. Gauvain, L. Lamel, G. Adda, and M. Et-adda-decker, Using Information Retrieval Methods for Language Model Adaptation, Proc. of the 7th European Conference on Speech Communication and Technology (Eurospeech), 2001.

L. Chen, Y. Liu, M. P. Harper, and E. Shriberg, Multimodal Model Integration for Sentence Unit Detection, Proc. of Int. Conf. Multimodal Interfaces (ICMI), State College, 2004.

S. F. Chen and J. Et-goodman, An Empirical Study of Smoothing Techniques for Language Modeling, 1998.

F. Y. Choi, Advances in Domain Independent Linear Text Segmentation, 2000.

F. Y. Choi, P. Wiemer-hastings, and J. Et-moore, Latent Semantic Analysis for Text Segmentation, Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2001.

Y. Chow and S. Et-roukos, Speech Understanding Using a Unication Grammar, 1989.

Y. Chow and R. Schwartz, The N-Best algorithm, Proceedings of the workshop on Speech and Natural Language , HLT '89, 1989.
DOI : 10.3115/1075434.1075467

P. Clarkson and T. Et-robinson, Towards Improved Language Model Evaluation Measures, Proc. of the 6th European Conference on Speech Communication and Technology (Eurospeech), 1999.
DOI : 10.1006/csla.2000.0156

P. R. Clarkson and A. J. Et-robinson, Language model adaptation using mixtures and an exponentially decaying cache, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997.
DOI : 10.1109/ICASSP.1997.596049

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.5976

B. Daille, Conceptual structuring through term variations, Proceedings of the ACL 2003 workshop on Multiword expressions analysis, acquisition and treatment -, 2003.
DOI : 10.3115/1119282.1119284

URL : https://hal.archives-ouvertes.fr/hal-00456518

S. Deligne and F. Et-bimbot, Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams, 1995 International Conference on Acoustics, Speech, and Signal Processing, 1995.
DOI : 10.1109/ICASSP.1995.479391

S. Deligne and Y. Et-sakisaga, Learning a Syntagmatic and Paradigmatic Structure from Language Data with a Bi-Multigram Model, Proc. of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL), 1998.

G. Demetriou, E. Atwell, and C. Et-souter, Large-Scale Lexical Semantics for Speech Recognition Support, Proc. of the 5th European Conference on Speech, Communication, Technology (Eurospeech), 1997.

N. Deshmukh, A. Ganapathiraju, and J. Et-picone, Hierarchical Search for Large Vocabulary Conversational Speech Recognition, IEEE Signal Processing Magazine, vol.16, issue.5, p.84107, 1999.

M. El-bèze and A. Et-derouault, A Morphological Model for Large Vocabulary Speech Recognition, Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1990.

D. Équipe, Présentation du corpus de référence du français parlé. Recherches sur le français parlé, 2004.

A. Farhat, J. Isabelle, and D. Et-o-'shaughnessy, Clustering words for statistical language models based on contextual word similarity, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, 1996.
DOI : 10.1109/ICASSP.1996.540320

M. Federico, Bayesian estimation methods for n-gram language model adaptation, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996.
DOI : 10.1109/ICSLP.1996.607087

M. Federico, Ecient Language Model Adaptation through MDI Estimation, 1999.
DOI : 10.1109/icassp.2002.5743832

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.5581

C. Fellbaum and . Éditeur, WordNet: An Electronic Lexical Database, 1998.

S. Fernández, É. Sanjuan, and J. M. Et-torres-moreno, Énergie textuelle de mémoires associatives, Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles (TALN), 2007.

O. Ferret, ANTHAPSI : un système d'analyse thématique et d'apprentissage de connaissances pragmatiques fondé sur l'amorçage, Thèse de doctorat, 1998.

O. Ferret and B. Et-grau, Utiliser des corpus pour amorcer une analyse thématique, Traitement Automatique des Langues (TAL), vol.42, issue.2, p.517545, 2001.

R. Florian and D. Et-yarowsky, Dynamic nonlocal language modeling via hierarchical topic-based adaptation, Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics -, 1999.
DOI : 10.3115/1034678.1034711

URL : http://arxiv.org/abs/cs/0104019

W. A. Gale and G. Sampson, Good???turing frequency estimation without tears*, Journal of Quantitative Linguistics, vol.73, issue.3, p.217237, 1995.
DOI : 10.1080/09296179508590051

M. Galley, K. Mckeown, E. Fosler-lussier, and H. Et-jing, Discourse segmentation of multi-party conversation, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics , ACL '03, 2003.
DOI : 10.3115/1075096.1075167

R. Garside, Spoken English on Computer: Transcription, Mark-up and Application , chapitre Grammatical Tagging of the Spoken Part of the British National Corpus: A Progress Report, p.161167, 1995.

J. Gauvain, G. Adda, M. Adda-decker, A. Allauzen, V. Gendner et al., Where are we in Transcribing French Broadcast News, Proc. of the 9th European Conference on Speech Communication and Technology (Eurospeech), 2005.

J. Gauvain, L. Lamel, G. Adda, and M. Et-adda-decker, The LIMSI continuous speech dictation system, Proceedings of the workshop on Human Language Technology , HLT '94, 1994.
DOI : 10.3115/1075812.1075887

V. Gendner and M. Et-adda-decker, Analyse comparative de corpus oraux et écrits français : mots, lemmes et classes morpho-syntaxiques, Actes des 24èmes Journées d'Études sur la Parole (JEP), 2002.

M. Georgescul, A. Clark, and S. Armstrong, An analysis of quantitative aspects in the evaluation of thematic segmentation algorithms, Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, SigDIAL '06, 2006.
DOI : 10.3115/1654595.1654622

M. Georgescul, A. Clark, and S. Armstrong, Word Distributions for Thematic Segmentation in a Support Vector Machin Approach, Proc. of the 10th Conference on Computational Natural Language Learning (CoNLL), 2006.

M. Georgescul, A. Clark, and S. Armstrong, Exploiting Structural Meeting- Specic Features for Topic Segmentation, Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles (TALN), 2007.

P. Geutner, Introducing linguistic constraints into statistical language modeling, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996.
DOI : 10.1109/ICSLP.1996.607139

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.4422

M. Ghadessy and . Éditeur, Thematic Development in English Texts, 1995.

D. Gildea and T. Et-hofmann, Topic-Based Language Models Using EM, Proc. of the 6th European Conference on Speech Communication and Technology (Eurospeech ), 1999.

J. Gillett and W. Et-ward, A Language Model Combining Trigrams and Stochastic Context-Free Grammars, Proc. of the 5th International Conference on Spoken Language Processing (ICSLP), 1998.

L. Gillick and S. J. Et-cox, Some statistical issues in the comparison of speech recognition algorithms, International Conference on Acoustics, Speech, and Signal Processing, 1989.
DOI : 10.1109/ICASSP.1989.266481

J. Giménez and L. Et-màrquez, SVMTool: A General POS Tagger Generator Based on Support Vector Machines, Proc. of the 4th International Conference on Language Resources and Evaluation (LREC), 2004.

I. J. Good, The Population Frequencies of Species and the Estimation of Population Parameters, Biometrika, vol.40, pp.3-4237264, 1953.

J. T. Goodman, A Bit of Progress in Language Modeling, Extended Version. Rapport technique, Microsoft Research, 2001.

G. Gravier, J. Bonastre, S. Galliano, E. Geoffrois, M. Tait et al., ESTER, une campagne d'évaluation des systèmes d'indexation d'émissions radiophoniques, 2004.

G. Gravier, F. Yvon, B. Jacob, and F. Et-bimbot, Integrating Contextual Phonological Rules in a Large Vocabulary Decoder, Proc of the 7th European Conference on Speech Communication and Technology (Eurospeech), 2001.
URL : https://hal.archives-ouvertes.fr/hal-01457130

R. Gretter and G. Et-riccardi, On-line learning of language models with word error probability distributions, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), 2001.
DOI : 10.1109/ICASSP.2001.940892

L. Q. Ha, E. I. Sicilia-garcia, J. Ming, and F. J. Smith, Extension of Zipf's law to words and phrases, Proceedings of the 19th international conference on Computational linguistics -, 2002.
DOI : 10.3115/1072228.1072345

K. Hall and M. Et-johnson, Attention shifting for parsing speech, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics , ACL '04, 2004.
DOI : 10.3115/1218955.1218961

URL : http://acl.ldc.upenn.edu/acl2004/main/pdf/186_pdf_2-col.pdf

M. A. Halliday and R. Et-hasan, Cohesion in English, 1976.

J. S. Hamaker, Towards Building a Better Language Model for SWITCH- BOARD: the POS Tagging Task, Proc. of the International Conference on Information Intelligence and Systems (ICIIS), 1999.

M. P. Harper and R. A. Et-helzerman, Extensions to constraint dependency parsing for spoken language processing, Computer Speech & Language, vol.9, issue.3, p.187234, 1995.
DOI : 10.1006/csla.1995.0011

M. P. Harper, L. H. Jamieson, C. D. Mitchell, G. Ying, S. Potisuk et al., Integrating Language Models with Speech Recognition, Proc. of the AAAI94 Workshop on the Integration of Natural Language and Speech Processing, 1994.

M. P. Harper, M. T. Johnson, L. H. Jamieson, S. A. Hockema, and C. M. Et-white, Interfacing a CDG parser with an HMM word recognizer using word graphs, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), 1999.
DOI : 10.1109/ICASSP.1999.759771

P. Hart, N. Nilsson, and B. Et-raphael, A Formal Basis for the Heuristic Determination of Minimum Cost Paths, IEEE Transactions on Systems Science and Cybernetics, vol.4, issue.2, p.100107, 1968.
DOI : 10.1109/TSSC.1968.300136

A. Hauenstein and H. Et-weber, An Investigation of Tightly-Coupled Time- Synchronous Speech Language Understanding Using a Unication Grammar, Proc. of the 12th National Conference on Articial Intelligence Workshop on the Integration of Natural Language and Speech Processing, 1994.

M. A. Hearst, TextTiling: Segmenting Text into Multi-Paragraph Subtopic Passages, Computational Linguistics, vol.23, issue.1, p.3364, 1997.
DOI : 10.3115/981732.981734

URL : http://arxiv.org/abs/cmp-lg/9406037

P. A. Heeman, POS Tags and Decision Trees for Language Modeling, Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999.

P. A. Heeman and J. F. Et-allen, Speech Repairs, Intonational Phrases, and Discourse Markers: Modeling Speakers' Utterances in Spoken Dialog, Computational Linguistics, vol.25, issue.4, p.527571, 1999.

O. Heinonen, Optimal multi-paragraph text segmentation by dynamic programming, Proceedings of the 36th annual meeting on Association for Computational Linguistics -, 1998.
DOI : 10.3115/980691.980814

URL : http://arxiv.org/abs/cs/9812005

S. Henry, Quelles répétitions à l'oral ? Esquisse d'une typologie, Actes des 2èmes Journées de Linguistique de Corpus, 2002.

S. Henry and B. Et-pallaud, Word Fragments and Repeats in Spontaneous Spoken French, Proc. of Disuency in Spontaneous Speech Workshop (DISS), 2003.
URL : https://hal.archives-ouvertes.fr/hal-00283726

S. Huet, G. Gravier, and P. Et-sébillot, Are Morphosyntactic Taggers Suitable to Improve Automatic Transcription?, Proc. of the 9th International Conference on Text, Speech and Dialogue (TSD), 2006.
DOI : 10.1007/11846406_49

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.109

S. Huet, G. Gravier, and P. Et-sébillot, Morphosyntactic Processing of N-Best Lists for Improved Recognition and Condence Measure Computation, Proc. of the 10th European Conference on Speech Communication and Technology (Eurospeech ), 2007.

S. Huet, G. Lecorvé, G. Gravier, and P. Et-sébillot, Multimodal Processing and Interaction: Audio, Video, Text, Multimedia Systems and Applications" , chapitre Toward the Integration of Natural Language Processing and Automatic Speech Recognition: Using Morpho-syntax and Pragmatics for Transcription, 2008.

S. Huet, P. Sébillot, and G. Et-gravier, Introduction de connaissances linguistiques en reconnaissance de la parole : un état de l'art, 2006.

D. Hull, Using statistical testing in the evaluation of retrieval experiments, Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '93, 1993.
DOI : 10.1145/160688.160758

M. Hurault-plantet, M. Jardino, and J. Et-berthelin, Ajustement des frontières de segments thématiques détectés automatiquement, Actes du 2ème dé fouilles de textes (DEFT), 2006.

I. Ide, H. Mo, N. Katayama, and S. Et-satoh, Topic-based inter-video structuring of a large-scale news video corpus, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698), 2003.
DOI : 10.1109/ICME.2003.1221309

N. Ide and J. Et-véronis, MULTEX: Multilingual Text Tools and Corpora, Proc. of the 15th International Conference on Computational Linguistics (COLING), 1994.

N. Ide and J. Et-véronis, Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art, Computational Linguistics, vol.24, issue.1, p.240, 1998.

R. Isotani and S. Et-matsunaga, Speech recognition using a stochastic language model integrating local and global constraints, Proceedings of the workshop on Human Language Technology , HLT '94, 1994.
DOI : 10.3115/1075812.1075829

R. Iyer and M. Et-ostendorf, Modeling long distance dependence in language: topic mixtures versus dynamic cache models, IEEE Transactions on Speech and Audio Processing, vol.7, issue.1, p.3039, 1999.
DOI : 10.1109/89.736328

M. Jardino, Multilingual stochastic n-gram class language models, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, 1996.
DOI : 10.1109/ICASSP.1996.540315

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.1390

F. Jelinek, Readings in Speech Recognition, chapitre Self-Organized Language Modeling for Speech Recognition, p.450506, 1990.

F. Jelinek, Statistical Methods for Speech Recognition, 1998.

F. Jelinek and J. D. Et-lafferty, Computation of the Probability of Initial Substring Generation by Stochastic Context-Free Grammars, Computational Linguistics, vol.17, issue.3, p.315323, 1991.

J. X. Et-zha and H. , Domain-Independent Text Segmentation Using Anisotropic Diffusion and Dynamic Programming, Proc. of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2003.

H. Jiang, Condence Measures for Speech Recognition: A Survey, Speech Communication, vol.45, p.455470, 2005.

A. C. Jobbins and L. J. Et-evett, Text Segmentation Using Reiteration and Collocation, Proc. of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL), 1998.

A. Juneja, Speech Recognition Based on Phonetic Features and Acoustic Landmarks, Thèse de doctorat, 2004.

D. Jurafsky and J. H. Et-martin, Speech and Natural Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice-Hall, 2ème édition. À paraître (disponible partiellement sur http, 2008.

D. Jurafsky, C. Wooters, J. Segal, A. Stolcke, E. Fosler et al., Using a stochastic context-free grammar as a language model for speech recognition, 1995 International Conference on Acoustics, Speech, and Signal Processing, 1995.
DOI : 10.1109/ICASSP.1995.479396

M. Kan, J. L. Klavans, and K. R. Et-mckeown, Linear Segmentation and Segment Signicance, Proc. of the 6th Workshop on Very Large Corpora (WVLC), 1998.

S. M. Katz, Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.35, issue.3, p.400401, 1987.
DOI : 10.1109/TASSP.1987.1165125

S. Kaufmann, Cohesion and collocation, Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics -, 1999.
DOI : 10.3115/1034678.1034686

A. Kehagias, F. Pavlina, and V. Et-petridis, Linear text segmentation using a dynamic programming algorithm, Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics , EACL '03, 2003.
DOI : 10.3115/1067807.1067831

T. Kemp and A. Et-waibel, Reducing the OOV Rate in Broadcast News Speech Recognition, Proc. of the 5th International Conference on Spoken Language Processing (ICSLP), 1998.

S. Khudanpur and J. Et-wu, A Maximum Entropy Language Model to Integrate N- Grams and Topic Dependencies for Conversational Speech Recognition, Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1999.

A. Kilgarriff and G. Et-grefenstette, Introduction to the Special Issue on the Web as Corpus, Computational Linguistics, vol.19, issue.1, p.333347, 2003.
DOI : 10.1038/21987

W. Kim, Language Model Adaptation for ASR and Statistical MT, Thèse de doctorat, 2004.

K. Kirchhoff, J. Bilmes, and K. Et-duh, Factored Language Model Tutorial, 2007.

K. Kita, T. Kawabata, and H. Et-saito, HMM continuous speech recognition using predictive LR parsing, International Conference on Acoustics, Speech, and Signal Processing, 1989.
DOI : 10.1109/ICASSP.1989.266524

D. Klakow, Selecting articles from the language model training corpus, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), 2000.
DOI : 10.1109/ICASSP.2000.862077

R. Kneser, Statistical language modeling using a variable context length, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996.
DOI : 10.1109/ICSLP.1996.607162

R. Kneser and H. Et-ney, Improved Clustering Techniques for Class-Based Statistical Language Modelling, Proc. of the 3rd European Conference on Speech Communication and Technology (Eurospeech), 1993.

R. Kneser and H. Et-ney, Improved backing-off for M-gram language modeling, 1995 International Conference on Acoustics, Speech, and Signal Processing, 1995.
DOI : 10.1109/ICASSP.1995.479394

R. Kneser and V. Et-steinbiss, On the dynamic adaptation of stochastic language models, IEEE International Conference on Acoustics Speech and Signal Processing, 1993.
DOI : 10.1109/ICASSP.1993.319375

P. Koehn, F. J. Och, and D. Et-marcu, Statistical Phrase-Based Translation. Dans roc, of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL), 2003.

H. Kozima, Text segmentation based on similarity between words, Proceedings of the 31st annual meeting on Association for Computational Linguistics -, 1993.
DOI : 10.3115/981574.981616

R. Kuhn and R. Et-de-mori, A cache-based natural language model for speech recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.12, issue.6, p.570583, 1990.
DOI : 10.1109/34.56193

H. J. Kuo and W. Et-reichl, Phrase-Based Language Models for Speech Recognition, Proc. of the 6th European Conference on Speech Comunication and Technology (Eurospeech), 1999.

J. Lafferty, D. Sleator, and D. Et-temperley, Grammatical Trigrams: A Probabilistic Link Grammar, Proc. of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, 1992.

I. R. Lane, T. Kawahara, T. Matsui, and S. Et-nakamura, Dialogue Speech Recognition by Combining Hierarchical Topic Classication and Language Model Switching, IEICE Transactions on Information and Systems, issue.3, pp.88-446454, 2005.

D. Langlois, A. Brun, K. Smaïli, and J. Et-haton, Événements impossibles en modélisation stochastique du langage, Traitement Automatique des Langues (TAL), vol.44, issue.1, p.3361, 2003.

M. Lapata and F. Et-keller, Web-based models for natural language processing, ACM Transactions on Speech and Language Processing, vol.2, issue.1, p.131, 2005.
DOI : 10.1145/1075389.1075392

C. Lavecchia, K. Smaïli, and J. Et-haton, How to Handle Gender and Number Agreement in Statistical Language Models, Proc.of the 9th International Conference on Spoken Language Processing (ICSLP), 2006.
URL : https://hal.archives-ouvertes.fr/inria-00103497

J. Lecomte, Codage Multext pour GRACE/MULTITAG. Critères d'assignation des étiquettes morpho-syntaxiques, 1997.

G. Lecorvé, Adaptation thématique d'un système de transcription automatique de la parole. Mémoire de master recherche, INSA de Rennes, 2007.

C. J. Leggetter and P. C. Et-woodland, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, Computer Speech & Language, vol.9, issue.2, p.171185, 1995.
DOI : 10.1006/csla.1995.0010

V. I. Levenshtein, Binary Codes Capable of Correcting Deletions, Insertions, and Reversals, Soviet Physics Doklady, vol.10, issue.8, p.707710, 1966.

H. Li and K. Et-yamanishi, Topic Analysis Using a Finite Mixture Model, Proc. of Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP-VLC), 2000.

D. Lin and P. Et-pantel, Induction of semantic classes from natural language text, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '01, 2001.
DOI : 10.1145/502512.502558

D. Linares, J. Benedí, and J. Et-sánchez, -grams and stochastic context-free grammars, ACM Transactions on Asian Language Information Processing, vol.3, issue.2, p.113127, 2004.
DOI : 10.1145/1034780.1034783

URL : https://hal.archives-ouvertes.fr/hal-01276708

S. A. Liu, Landmark Detection for Distinctive Feature-Based Speech Recognition, Thèse de doctorat, Massachusetts Institute of Technology, 1995.

Y. Liu, A. Stolcke, M. P. Harper, and E. Shriberg, Comparing and Combining Generative and Posterior Probability Models: Some Advances in Sentence Boundary Detection in Speech, Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2004.

M. Mahajan, D. Beeferman, and X. D. Et-huang, Improved Topic-Dependent Language Modeling Using Informationretrieval Techniques, Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1999.

G. Maltese and F. Et-mancini, An automatic technique to include grammatical and morphological information in a trigram-based statistical language model, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992.
DOI : 10.1109/ICASSP.1992.225948

L. Mangu, E. Brill, and A. Et-stolcke, Finding consensus in speech recognition: word error minimization and other applications of confusion networks, Computer Speech & Language, vol.14, issue.4, p.373400, 2000.
DOI : 10.1006/csla.2000.0152

M. P. Marcus, B. Santorini, and M. A. Et-marcinkiewicz, Building a Large Annotated Corpus of English: The Penn Treebank, Computational linguistics, vol.19, issue.2, p.313330, 1993.

S. C. Martin, J. Liermann, and H. Et-ney, Adaptive Topic Dependent Language Modelling Using Word-Based Varigrams, Proc. of the 5th European Conference on Speech Communication and Technology (Eurospeech), 1997.

L. Melis, Le français parlé et le français écrit, une opposition à géométrie variable, Romaneske, vol.25, issue.3, p.5666, 2000.

A. Mendes, R. Amaro, and M. F. Et-bacelar-do-nascimento, Reusing Available Resources for Tagging a Spoken Portuguese Corpus, Proc. of the Workshop on Tagging and Shallow Processing of Portuguese (TASHA), 2003.

B. Merialdo, Tagging English Text with a Probabilistic Model, Computational Linguistics, vol.20, issue.2, p.155171, 1994.

D. Moraru and G. Et-gravier, Décodage avec ancrage pour la reconnaissance automatique de la parole, Actes des 26èmes Journées d'Études sur la Parole (JEP), 2006.

A. Moreno and J. M. Et-guirao, Tagging a Spontaneous Speech Corpus of Spanish, 2003.

J. Morris and G. Et-hirst, Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text, Computational Linguistics, vol.17, issue.1, p.2148, 1991.

X. Mou, S. Seneff, and V. Et-zue, Integration of Supra-Lexical Linguistic Models with Speech Recognition Using Shallow Parsing and Finite State Transducers, Proc. of the 7th International Conference on Spoken Language Processing (ICSLP), 2002.

F. Namer, FLEMM : un analyseur exionnel du français à base de règles, Traitement Automatique des Langues (TAL), vol.41, issue.2, p.523547, 2000.

A. Nasr, Y. Estève, F. Béchet, T. Spriet, and R. Et-de-mori, A Language Model Combining N-Grams and Stochastic Finite State Automata, Proc. of the 6th European Conference on Speech Communication and Technology (Eurospeech), 1999.
URL : https://hal.archives-ouvertes.fr/hal-01434731

H. Ney, Dynamic programming parsing for context-free grammars in continuous speech recognition, IEEE Transactions on Signal Processing, vol.39, issue.2, p.336340, 1991.
DOI : 10.1109/78.80816

T. R. Niesler, E. W. Whittaker, and P. C. Et-woodland, Comparison of Partof-Speech and Automatically Derived Category-Based Language Models for Speech Recognition, Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1998.

T. R. Niesler and P. C. Et-woodland, Combination of word-based and category-based language models, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996.
DOI : 10.1109/ICSLP.1996.607081

T. R. Niesler and P. C. Et-woodland, A variable-length category-based n-gram language model, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, 1996.
DOI : 10.1109/ICASSP.1996.540316

R. Nisimura, K. Komatsu, Y. Kuroda, K. Nagatomo, A. Lee et al., Automatic N-Gram Language Model Creation from Web Resources, Proc. of the 7th European Conference on Speech Communication and Technology (Eurospeech), 2001.

J. Nivre, Sparse Data and Smoothing in Statistical Part-of-Speech Tagging, Journal of Quantitative Linguistics, vol.7, issue.1, p.117, 2000.
DOI : 10.1076/0929-6174(200004)07:01;1-3;FT001

J. Nivre and L. Et-grönqvist, Tagging a Corpus of Spoken Swedish, International Journal of Corpus Linguistics, vol.6, issue.1, p.4778, 2001.
DOI : 10.1075/ijcl.6.1.03niv

M. Okumura and T. Et-honda, Word Sense Disambiguation and Text Segmentation Based on Lexical Cohesion, Proc. of the 15th International Conference on Computational Linguistics (COLING), 1994.

S. Ortmanns, H. Ney, and X. Et-aubert, A word graph algorithm for large vocabulary continuous speech recognition, Computer Speech & Language, vol.11, issue.1, p.4372, 1997.
DOI : 10.1006/csla.1996.0022

B. Pallaud and S. Henry, Amorces de mots et répétitions : des hésitations plus que des erreurs en français parlé, Actes des 7èmes Journées internationales d'Analyse statistique des Données Textuelles (JADT), 2004.

D. S. Pallett, A look at NIST'S benchmark ASR tests: past, present, and future, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721), 2003.
DOI : 10.1109/ASRU.2003.1318488

A. Panunzi, E. Picchi, and M. Et-moneglia, Using PiTagger for Lemmatization and PoS Tagging of a Spontaneous Speech Corpus: C-Oral-Rom Italian, Proc. of the 4th International Conference on Language Resources and Evaluation (LREC), 2004.

P. Paroubek, I. Robba, A. Vilnat, and C. Et-ayache, Data, Annotations and Measures in EASY, the Evaluation Campaign for Parsers of French, Proc. of the 5th International Conference on Language Resources and Evaluation (LREC), 2006.

R. J. Passonneau and D. J. Et-litman, Intention-based segmentation, Proceedings of the 31st annual meeting on Association for Computational Linguistics -, 1993.
DOI : 10.3115/981574.981594

R. J. Passonneau and D. J. Et-litman, Discourse Segmentation by Human and Automated Means, Computational Linguistics, vol.23, issue.1, p.103139, 1997.

F. Peng and D. Et-schuurmans, A Simple Closed-Class/Open-Class Factorization for Language Modeling, Proc. of the 6th Natural Language Processing Pacic Rim Symposium (NLPRS), 2001.

L. Pevzner and M. A. Et-hearst, A Critique and Improvement of an Evaluation Metric for Text Segmentation, Computational Linguistics, vol.17, issue.1, p.1936, 2002.
DOI : 10.1126/science.264.5164.1421

A. Polguère, Lexicologie et sémantique lexicale : notions fondamentales. Les Presses de l, 2003.

M. F. Porter, An Algorithm for Sux Stripping, Program, vol.14, issue.3, p.130137, 1980.

S. Quiniou, É. Anquetil, and S. Et-carbonnel, Statistical language models for on-line handwritten sentence recognition, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005.
DOI : 10.1109/ICDAR.2005.220

URL : https://hal.archives-ouvertes.fr/hal-00580641

J. R. Quinlan, C4.5: Programs for Machine Learning, 1993.

L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proc. of the IEEE, p.257285, 1989.
DOI : 10.1016/B978-0-08-051584-7.50027-9

F. Rastier, L'analyse thématique des données textuelles, chapitre La sémantique des thèmes ou le voyage sentimental, p.223249, 1995.

A. Ratnaparkhi, A Maximum Entropy Part-of-Speech Tagger, Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 1996.

A. Ratnaparkhi, A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Rapport technique, Institute for Research in Cognitive Science, 1997.

J. C. Reynar, An automatic method of finding topic boundaries, Proceedings of the 32nd annual meeting on Association for Computational Linguistics -, 1994.
DOI : 10.3115/981732.981783

J. C. Reynar, Topic Segmentation: Algorithms and Applications, Thèse de doctorat, 1998.

J. C. Reynar, Statistical models for topic segmentation, Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics -, 1999.
DOI : 10.3115/1034678.1034735

G. Riccardi, R. Pieraccini, and E. Et-bocchieri, Stochastic automata for language modeling, Computer Speech & Language, vol.10, issue.4, p.265293, 1996.
DOI : 10.1006/csla.1996.0014

K. Richmond, A. Smith, and E. Et-amitay, Detecting Subject Boundaries within Text: A Language Independent Statistical Approach, Proc. of the 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP), 1997.

K. Ries, F. D. Buø, and A. Waibel, Class phrase models for language modeling, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996.
DOI : 10.1109/ICSLP.1996.607138

E. S. Ristad, A natural law of succession, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252), 1995.
DOI : 10.1109/ISIT.1998.709050

B. Roark, Probabilistic Top-Down Parsing and Language Modelling, Computational Linguistics, vol.27, issue.2, p.249276, 2001.
DOI : 10.1162/089120101750300526

URL : http://doi.org/10.1162/089120101750300526

R. Rosenfeld, A Maximum Entropy Approach to Adaptive Statistical Language Modeling, Computer, Speech and Language, vol.10, p.187228, 1996.

R. Rosenfeld, A whole sentence maximum entropy language model, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, 1997.
DOI : 10.1109/ASRU.1997.659010

R. Rosenfeld, Incorporating linguistic structure into statistical language models, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol.358, issue.1769, p.13111324, 2000.
DOI : 10.1098/rsta.2000.0588

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.7356

R. Rosenfeld, Two decades of statistical language modeling: where do we go from here?, Proceedings of the IEEE, vol.88, issue.8, p.12701278, 2000.
DOI : 10.1109/5.880083

M. Rossignol, Acquisition sur corpus d'informations lexicales fondées sur la sémantique diérentielle, Thèse de doctorat, 2005.

M. Rossignol and P. Et-sébillot, Extraction statistique sur corpus de classes de mots-clés thématiques, Traitement Automatique des Langues (TAL), vol.44, issue.3, p.217246, 2003.

M. Rossignol and P. Et-sébillot, Acquisition sur corpus non spécialisés de classes sémantiques thématisées, Actes des 8èmes journées internationales d'analyse statistique des données textuelles (JADT), 2006.

B. Rueber, Obtaining Condence Measures from Sentence Probabilities, Proc. of the 5th European Conference on Speech, Communication, Technology (Eurospeech ), 1997.

A. Sako, T. Takiguchi, and Y. Et-ariki, Language Modeling Using PLSA-Based Topic HMM, Proc. of the 10th European Conference on Speech Communication and Technology (Eurospeech), 2007.
DOI : 10.1093/ietisy/e91-d.3.522

G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, 1989.

G. Salton, A. Singhal, C. Buckley, and M. Et-mitra, Automatic text decomposition using text segments and text themes, Proceedings of the the seventh ACM conference on Hypertext , HYPERTEXT '96, 1996.
DOI : 10.1145/234828.234834

C. Samuelsson and W. Et-reichl, A class-based language model for large-vocabulary speech recognition extracted from part-of-speech statistics, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), 1999.
DOI : 10.1109/ICASSP.1999.758181

J. Savoy, A Stemming Procedure and Stopword List for General French Corpora, Journal of the American Society for Information Science, vol.50, issue.10, p.944952, 1999.

P. Scheytt, P. Geutner, and A. Et-waibel, Serbo-Croatian LVCSR on the dictation and broadcast news domain, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), 1998.
DOI : 10.1109/ICASSP.1998.675410

H. Schmid, Probabilistic Part-of-Speech Tagging Using Decision Trees, Proc. of the International Conference on New Methods in Language Processing, 1994.

H. Schmid, Improvements in Part-of-Speech Tagging with an Application to German, Proc. of the ACL SIGDAT Workshop, 1995.
DOI : 10.1007/978-94-017-2390-9_2

R. Schwartz and S. Et-austin, Ecient, High-Performance Algorithms for N-Best Search, Proc. of the DARPA Speech and Natural Language Workshop, 1990.

R. Schwartz and S. Et-austin, A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing, 1991.
DOI : 10.1109/ICASSP.1991.150436

F. Sebastiani, Machine learning in automated text categorization, ACM Computing Surveys, vol.34, issue.1, p.147, 2002.
DOI : 10.1145/505282.505283

S. Seneff, TINA: A Natural Language System for Spoken Language Applications, Computational Linguistics, vol.18, issue.1, p.6186, 1992.

S. Seneff, M. Mccandless, and V. Et-zue, Integrating Natural Language into the Word Graph Search for Simultaneous Speech Recognition and Understanding, 1995.

S. Seneff, C. Wang, and T. J. Et-hazen, Automatic Induction of N-Gram Language Models from a Natural Language Grammar, Proc. of the 8th European Conference on Speech Communication and Technology (Eurospeech), 2003.

A. Sethy, P. G. Georgiou, and S. Et-narayanan, Buildling Topic Specic Language Models from Webdata Using Competitive Models, Proc. of the 9th European Conference on Speech Communication and Technology (Eurospeech), 2005.

K. Seymore and R. Et-rosenfeld, Using Story Topics for Language Model Adaptation, Proc. of the 5th European Conference on Speech Communication and Technology (Eurospeech), 1997.

C. E. Shannon, Prediction and Entropy of Printed English, Bell System Technical Journal, vol.30, issue.1, p.5064, 1951.
DOI : 10.1002/j.1538-7305.1951.tb01366.x

E. Shriberg, To ???errrr??? is human: ecology and acoustics of speech disfluencies, Journal of the International Phonetic Association, vol.31, issue.01, p.153169, 2001.
DOI : 10.1017/S0025100301001128

M. Siu and M. Et-ostendorf, Variable N-Grams and Extensions for Conversational Speech Language Modeling, IEEE Transactions on Speech and Audio Processing, vol.8, issue.1, p.6375, 2000.

N. Stokes, J. Carthy, and A. F. Et-smeaton, SeLeCT: A Lexical Cohesion Based News Story Segmentation System, p.312, 2004.

A. Stolcke, An Ecient Probabilistic Context-Free Parsing Algorithm that Computes Prex Probabilities, Computational Linguistics, vol.21, issue.2, p.165202, 1995.

A. Stolcke, SRILM -An Extensible Language Modeling Toolkit, Proc. of the 7th International Conference on Spoken Language Processing (ICSLP), 2002.

A. Stolcke, Y. König, and M. Et-weintraub, Explicit Word Error Minimization In N-Best List Rescoring, Proc. of the 5th European Conference on Speech, Communication, 1997.

A. Stolcke and J. Segal, -gram probabilities from stochastic context-free grammars, Proceedings of the 32nd annual meeting on Association for Computational Linguistics -, 1994.
DOI : 10.3115/981732.981743

URL : https://hal.archives-ouvertes.fr/hal-01194269

A. Stolcke and E. Et-shriberg, Statistical Language Modeling for Speech Disuencies, Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996.

A. Stolcke, E. Shriberg, R. Bates, M. Ostendorf, D. Hakkani et al., Automatic Detection of Sentence Boundaries and Disuencies Based on Recognized Words, Proc. of the 5th International Conference on Spoken Language Processing (ICSLP), 1998.

K. Su, T. Chiang, and Y. Et-lin, A Unied Framework to Incorporate Speech and Language Information in Spoken Language Processing, Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992.

B. Suhm and A. Et-waibel, Towards Better Language Models for Spontaneous Speech, Proc. of the 3rd International Conference on Spoken Language Processing (ICSLP), 1994.

M. Suzuki, Y. Kajiura, A. Ito, and S. Et-makino, Unsupervised Language Model Adaptation Based on Automatic Text Collection from WWW, Proc. of Interspeech, 2006.

C. Tillmann and H. Et-ney, Word Triggers and the EM Algorithm, Proc. of the Workshop Computational Natural Language Learning (CoNLL), 1997.

G. Tür, D. Hakkani-tür, A. Stolcke, and E. Et-shriberg, Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation, Computational Linguistics, vol.21, issue.1, p.3157, 2001.

K. Uchimoto, C. Nobata, A. Yamada, S. Sekine, and H. Et-isahara, Morphological analysis of the spontaneous speech corpus, Proceedings of the 19th international conference on Computational linguistics -, 2002.
DOI : 10.3115/1071884.1071903

M. Utiyama and H. Et-isahara, A statistical model for domain-independent text segmentation, Proceedings of the 39th Annual Meeting on Association for Computational Linguistics , ACL '01, 2001.
DOI : 10.3115/1073012.1073076

A. Valli and J. Véronis, Étiquetage grammatical de corpus oraux : problèmes et perspectives. Revue française de linguistique appliquée, p.113133, 1999.

F. Van-eynde, J. Zavrel, and W. Et-daelemans, Part of Speech Tagging and Lemmatisation for the Spoken Dutch Corpus, Proc. of the Conference on Language Resources and Evaluation (LREC), 2000.

P. Van-mulbregt, I. Carp, L. Gillick, S. Lowe, and J. Et-yamron, Segmentation of Automatically Transcribed Broadcast News Text, Proc. of the DARPA Broadcast News Workshop, 1999.

D. Vaufreydaz, Modélisation statistique du langage à partir d'Internet pour la reconnaissance automatique de la parole continue, Thèse de doctorat, 2002.

D. Vaufreydaz, M. Akbar, and J. Et-rouillard, Internet Documents: A Rich Source for Spoken Language Modeling, Proc. of the IEEE Workshop Automatic Speech Recognition and Understanding (ASRU), 1999.
URL : https://hal.archives-ouvertes.fr/inria-00326147

J. Vergne, Étude et modélisation de la syntaxe des langues à l'aide de l'ordinateur . Analyse syntaxique non combinatoire. Synthèse et résultats. Habilitation à diriger des recherches, 1999.

D. Vergyri, K. Kirchhoff, K. Duh, and A. Et-stolcke, Morphology-Based Language Modeling for Arabic Speech Recognition, Proc. of the 8th International Conference on Spoken Language Processing (ICSLP), 2004.

J. Véronis, Ingénierie des langues, chapitre Annotation automatique de corpus : panorama et état de la technique, p.111129, 2000.

J. Véronis, Le traitement automatique des corpus oraux, Traitement Automatique des Langues (TAL), vol.45, issue.2, p.714, 2004.

A. J. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Transactions on Information Theory, vol.13, issue.2, pp.260-269, 1967.
DOI : 10.1109/TIT.1967.1054010

W. Wang and M. P. Et-harper, The SuperARV language model, Proceedings of the ACL-02 conference on Empirical methods in natural language processing , EMNLP '02, 2002.
DOI : 10.3115/1118693.1118724

W. Wang, M. P. Harper, and A. Et-stolcke, The robustness of an almost-parsing language model given errorful training data, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., 2003.
DOI : 10.1109/ICASSP.2003.1198762

W. Wang and A. Et-stolcke, Integrating MAP, Marginals, and Unsupervised Language Model Adaptation, Proc. of the 10th European Conference on Speech Communication and Technology (Eurospeech), 2007.

W. Wang, A. Stolcke, and M. P. Et-harper, The use of a linguistically motivated language model in conversational speech recognition, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004.
DOI : 10.1109/ICASSP.2004.1325972

Y. Wang, A. Acero, and C. Et-chelba, Is Word Error Rate a Good Indicator for Spoken Language Understanding Accuracy, Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2003.

Y. Wang, M. Mahajan, and X. Et-huang, A Unied Context-Free Grammar and N-Gram Model for Spoken Language Processing, Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000.

M. Weintraub, Y. Aksu, S. Dharanipragada, S. Khudanpur, H. Ney et al., LM95 project report: Fast training and portability, 1996.

F. Wessel, R. Schlüter, K. Macherey, and H. Et-ney, Condence Measures for Large Vocabulary Continuous Speech Recognition, IEEE Transactions on Speech and Audio Processing, vol.9, issue.3, p.288298, 2001.

F. Wessel, R. Schlüter, and H. Et-ney, Using posterior word probabilities for improved speech recognition, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), 2000.
DOI : 10.1109/ICASSP.2000.861989

E. W. Whittaker, Statistical Language Modelling for Automatic Speech Recognition of Russian and English, Thèse de doctorat, 2000.

I. H. Witten and T. C. Bell, The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression, IEEE Transactions on Information Theory, vol.37, issue.4, p.10851094, 1991.
DOI : 10.1109/18.87000

I. H. Witten and E. Frank, Data mining, ACM SIGMOD Record, vol.31, issue.1, 2005.
DOI : 10.1145/507338.507355

P. C. Woodland, Speaker Adaptation for Continuous Density HMMs: A Review, Proc. of ITRW on Adaptation Methods for Speech Recognition, 2001.

J. Wu and S. Et-khudanpur, Combining Nonlocal, Syntactic and N-Gram Dependencies in Language Modeling, Proc. of the 5th European Conference on Speech Communication and Technology (Eurospeech), 1999.

Y. Yaari, Segmentation of Expository Texts by Hierarchical Agglomerative Clustering, Proc. of Recent Advances in Natural Language Processing (RANLP), 1997.

F. Yvon, P. B. Boula-de-mareüil, C. D-'allessandro, V. Auberge, M. Bagein et al., Objective evaluation of grapheme to phoneme conversion for text-to-speech synthesis in French, Computer Speech & Language, vol.12, issue.4, p.393410, 1998.
DOI : 10.1006/csla.1998.0104

K. Zechner and A. Et-waibel, Using Chunk Based Partial Parsing of Spontaneous Speech in Unrestricted Domains for Reducing Word Error Rate in Speech Recognition, Proc. of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL), 1998.

X. Zhu and R. Et-rosenfeld, Improving Trigram Language Modeling with the World Wide Web, Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001.

I. Zitouni, K. Smaïli, and J. Et-haton, Statistical language modeling based on variable-length sequences, Computer Speech & Language, vol.17, issue.1, p.2741, 2003.
DOI : 10.1016/S0885-2308(02)00026-8

URL : https://hal.archives-ouvertes.fr/inria-00099785