Indexation sonore : recherche de composantes primaires pour une structuration audiovisuelle

Abstract : To process the quantity of audiovisual information available in a smart and rapid way, it is necessary to have robust and automatic tools. This work addresses the soundtrack indexing and structuring of multimedia documents. Their goals are to detect the primary components: speech, music and key sounds. For speech/music classification, three unusual parameters are extracted: entropy modulation, stationary segment duration (with a Forward-Backward Divergence algorithm) and the number of segments. These three parameters are merged with the classical 4 Hertz modulation energy. Experiments on radio corpora show the robustness of these parameters. The system is compared and merged with a classical system. Another partitioning consists in detecting pertinent key sounds. For jingles, the selection of candidates is done by comparing the “signature” of each jingle with the data flow. This system is simple, fast and efficient. Applause and laughter are based on GMM with spectral analysis. A TV corpus validates this study by encouraging results. The detection of key words is carried out in a traditional way: the problem here is not to improve the existing systems but to be in a structuring task: these key words inform about the program type (news, weather, documentary...). Through two studies, a reflection is done for the component uses in order to find a temporal structure of the audiovisual documents. The first study is a detection of a recurring production invariant in program collections. The second permits to structure TV news into topics. Some examples of video analysis contribution are developed.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00008755
Contributor : Julien Pinquier <>
Submitted on : Friday, March 11, 2005 - 11:55:58 AM
Last modification on : Thursday, June 27, 2019 - 4:27:42 PM
Long-term archiving on : Friday, April 2, 2010 - 10:02:03 PM

Identifiers

  • HAL Id : tel-00008755, version 1

Citation

Julien Pinquier. Indexation sonore : recherche de composantes primaires pour une structuration audiovisuelle. Interface homme-machine [cs.HC]. Université Paul Sabatier - Toulouse III, 2004. Français. ⟨tel-00008755⟩

Share

Metrics

Record views

446

Files downloads

2411