Skip to Main content Skip to Navigation
Theses

Indexation de documents audio : Cas des grands volumes de données

Abstract : This thesis is devoted to techniques for speaker-based recognition systems to scale up to large amounts of data and speaker models. We have chosen to partition audio documents (news broadcast) according to speakers. The mel-cepstral acoustic characteristics of each speaker are model through a probabilistic Gaussian mixture model. First, speaker change detection in the stream is carried out by Bayesian hypothesis testing. The scheme is incremental : as new speakers are detected, they are either identied in the database or new entries are created in the database. First, we have examined some issues related to building a tree structure exploiting a similarity between speaker models. Several contributions were made. First, a proposal for organising a set of speaker models, based on an elementary model grouping. Then, we used an approximation of Kullback-Leibler divergence for this purpose. Finally, through two studies using binary of nary tree structures, we discuss the way of a version suitable for incremental processing. Finally, perspectives are drawn regarding joint audio/video analysis and future needs are analyzed.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00450812
Contributor : Marc Gelgon <>
Submitted on : Wednesday, January 27, 2010 - 1:03:21 PM
Last modification on : Friday, May 10, 2019 - 12:23:14 PM
Long-term archiving on: : Thursday, October 18, 2012 - 1:21:09 PM

Identifiers

  • HAL Id : tel-00450812, version 1

Citation

Jamal Rougui. Indexation de documents audio : Cas des grands volumes de données. Interface homme-machine [cs.HC]. Université de Nantes, 2008. Français. ⟨tel-00450812⟩

Share

Metrics

Record views

323

Files downloads

2833