Skip to Main content Skip to Navigation
New interface

Multimodal Monolingual Comparable Corpus Alignment

Abstract : Increased production of information materials like text or audio available (newspapers, radio, audio of television programs, etc..) requires the development of automated tools for tracking and navigation. It should be possible for example, when reading a newspaper article online, to access parts of radio emissions corresponding to the current reading. This fine navigation between different media requires the alignment of "Passages" with similar content within document extracts of different comparable monolingual modalities. Our work focuses on this alignment problem of short texts in a multimodal monolingual comparable context. The problem lies in finding similarities between short text and how to extract the features of these texts to help us find similarities for the alignment process. We contribute to this problem in three parts. The first part tries to define similarity which is the basis of the alignment process. The second part aims at developing a new text representation to facilitate the creation of the gold corpus on which alignment methods will be evaluated. Finally, the third contribution is to study different methods of alignment and the effect of its components on the alignment process. These components include different text representations, weights and similarity measures.
Complete list of metadata

Cited literature [157 references]  Display  Hide  Download
Contributor : Prajol Shrestha Connect in order to contact the contributor
Submitted on : Tuesday, November 26, 2013 - 8:50:23 AM
Last modification on : Wednesday, April 27, 2022 - 4:22:39 AM
Long-term archiving on: : Monday, March 3, 2014 - 4:06:25 PM


  • HAL Id : tel-00909179, version 1


Prajol Shrestha. Multimodal Monolingual Comparable Corpus Alignment. Computation and Language [cs.CL]. Université de Nantes, 2013. English. ⟨NNT : ⟩. ⟨tel-00909179⟩



Record views


Files downloads