Skip to Main content Skip to Navigation

Segmentation parole/musique pour la transcription automatique de parole continue

Emmanuel Didiot 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this thesis, we study the segmentation of an audio stream in speech, music and speech on music (S/M). This is a fundamental step for all application based on automatic transcription of radiophonic stream and most commonly multimedia. The target application here is a keyword detection system in broadcast programs. The application performance depends on the quality of the signal segmentation given by the speech/music discrimination system. Indeed, bad signal classification can give miss-detections or false alarms. To improve the speech/music discrimination task, we propose a new signal parameterization method. We use the wavelet decomposition which allows an analysis of non-stationary signal like music for instance. We compute different energies on wavelet coefficients to construct our feature vectors. The signal is then segmented in four classes : speech (S), non-speech (NS), music (M) and non-music (NM), thanks to two apart class/non-class classification systems. These classification systems are based on HMM. We chose a class/non-class architecture because it allows to find independently the best parameters for each S/NS and P/NP tasks. A fusion of the classifier ouputs is then performed to obtain the final decision : speech, music or speech on music. The obtained results on a real broadcast program corpus show that our wavelet-based parameterization gives a significant improvement in performance in both M/NM and S/M discrimination tasks compared to the baseline parameterization using cepstral coefficients.
Complete list of metadata
Contributor : Emmanuel Didiot <>
Submitted on : Friday, February 1, 2008 - 5:46:09 PM
Last modification on : Friday, February 26, 2021 - 3:28:05 PM
Long-term archiving on: : Tuesday, September 21, 2010 - 3:51:32 PM


  • HAL Id : tel-01748262, version 3



Emmanuel Didiot. Segmentation parole/musique pour la transcription automatique de parole continue. Acoustique [physics.class-ph]. Université Henri Poincaré - Nancy 1, 2007. Français. ⟨tel-01748262v3⟩



Record views


Files downloads