Skip to Main content Skip to Navigation

Identification nommée du locuteur : exploitation conjointe du signal sonore et de sa transcription

Abstract : The automatic processing of speech is an area that encompasses a large number of works : speaker recognition, named entities detection or transcription of the audio signal into words. Automatic speech processing techniques can extract number of information from audio documents (meetings, shows, etc..) such as transcription, some annotations (the type of show, the places listed, etc..) or even information concerning speakers (speaker change, gender of speaker). All this information can be exploited by automatic indexing techniques which will allow indexing of large document collections. The work presented in this thesis are interested in the automatic indexing of speakers in french audio documents. Specifically we try to identify the various contributions of a speaker and nominate them by their first and last name. This process is known as named identification of the speaker. The particularity of this work lies in the joint use of audio and its transcript to name the speakers of a document. The first and last name of each speaker is extracted from the document itself (from its rich transcription more accurately), before being assigned to one of the speakers of the document. We begin by describing the context and previous work on the speaker named identification process before submitting Milesin, the system developed during this thesis. The contribution of this work lies firstly in the use of an automatic detector of named entities (LIA_NE) to extract the first name / last name of the transcript. Afterwards, they rely on the theory of belief functions to perform the assignment to the speakers of the document and thus take into account the various conflicts that may arise. Finally, an optimal assignment algorithm is proposed. This system gives an error rate of between 12 and 20 % on reference transcripts (done manually) based on the corpus used.We then present the advances and limitations highlighted by this work.We propose an initial study of the impact of the use of fully automatic transcriptions on Milesin.
Document type :
Complete list of metadata

Cited literature [9 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Monday, July 18, 2011 - 10:58:19 AM
Last modification on : Tuesday, March 31, 2020 - 3:21:28 PM
Long-term archiving on: : Wednesday, October 19, 2011 - 2:22:13 AM


Version validated by the jury (STAR)


  • HAL Id : tel-00609093, version 1


Vincent Jousse. Identification nommée du locuteur : exploitation conjointe du signal sonore et de sa transcription. Ordinateur et société [cs.CY]. Université du Maine, 2011. Français. ⟨NNT : 2011LEMA1008⟩. ⟨tel-00609093⟩



Record views


Files downloads