Skip to Main content Skip to Navigation

Indexation de documents audio : Cas des grands volumes de données

Abstract : This thesis is devoted to techniques for speaker-based recognition systems to scale up to large amounts of data and speaker models. We have chosen to partition audio documents (news broadcast) according to speakers. The mel-cepstral acoustic characteristics of each speaker are model through a probabilistic Gaussian mixture model. First, speaker change detection in the stream is carried out by Bayesian hypothesis testing. The scheme is incremental : as new speakers are detected, they are either identied in the database or new entries are created in the database. First, we have examined some issues related to building a tree structure exploiting a similarity between speaker models. Several contributions were made. First, a proposal for organising a set of speaker models, based on an elementary model grouping. Then, we used an approximation of Kullback-Leibler divergence for this purpose. Finally, through two studies using binary of nary tree structures, we discuss the way of a version suitable for incremental processing. Finally, perspectives are drawn regarding joint audio/video analysis and future needs are analyzed.
Document type :
Complete list of metadata
Contributor : Marc Gelgon Connect in order to contact the contributor
Submitted on : Wednesday, January 27, 2010 - 1:03:21 PM
Last modification on : Thursday, August 11, 2022 - 1:10:06 PM
Long-term archiving on: : Thursday, October 18, 2012 - 1:21:09 PM


  • HAL Id : tel-00450812, version 1


Jamal Rougui. Indexation de documents audio : Cas des grands volumes de données. Interface homme-machine [cs.HC]. Université de Nantes, 2008. Français. ⟨tel-00450812⟩



Record views


Files downloads