Segmentation parole/musique pour la transcription automatique de parole continue

Emmanuel Didiot 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this thesis, we study the segmentation of an audio stream in speech, music and speech on music (S/M). This is a fundamental step for all application based on automatic transcription of radiophonic stream and most commonly multimedia. The target application here is a keyword detection system in broadcast programs. The application performance depends on the quality of the signal segmentation given by the speech/music discrimination system. Indeed, bad signal classification can give miss-detections or false alarms. To improve the speech/music discrimination task, we propose a new signal parameterization method. We use the wavelet decomposition which allows an analysis of non-stationary signal like music for instance. We compute different energies on wavelet coefficients to construct our feature vectors. The signal is then segmented in four classes : speech (S), non-speech (NS), music (M) and non-music (NM), thanks to two apart class/non-class classification systems. These classification systems are based on HMM. We chose a class/non-class architecture because it allows to find independently the best parameters for each S/NS and P/NP tasks. A fusion of the classifier ouputs is then performed to obtain the final decision : speech, music or speech on music. The obtained results on a real broadcast program corpus show that our wavelet-based parameterization gives a significant improvement in performance in both M/NM and S/M discrimination tasks compared to the baseline parameterization using cepstral coefficients.
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00187941
Contributor : Emmanuel Didiot <>
Submitted on : Friday, February 1, 2008 - 5:46:09 PM
Last modification on : Friday, January 18, 2019 - 11:55:55 AM
Long-term archiving on : Tuesday, September 21, 2010 - 3:51:32 PM

Identifiers

  • HAL Id : tel-00187941, version 2

Citation

Emmanuel Didiot. Segmentation parole/musique pour la transcription automatique de parole continue. Acoustique [physics.class-ph]. Université Henri Poincaré - Nancy I, 2007. Français. ⟨tel-00187941v2⟩

Share

Metrics

Record views

637

Files downloads

450