Skip to Main content Skip to Navigation

Segmentation et indexation des signaux sonores musicaux

Abstract : This work deals with temporal segmentation and indexation of musical signals. Three interdependent schemes of segmentation are defined, which correspond to different levels of signal attributes.

1) The first scheme, named ``source'' scheme, concerns mainly the distinction between speech and music on movie sound tracks and on radio broadcasts.

Features have been examined: they intend to measure distinct properties of speech and music. They are combined into several multidimensional classification frameworks. The performance of the system is discussed.

2) The second scheme, named ``feature'' scheme, refers to labels such as: silence/sound, voiced/unvoiced, harmonic/inharmonic, monophonic/polyphonic, with vibrato/without vibrato. Most of these characteristics are features used by the third scheme.

Vibrato detection, vibrato parameter (its frequency and its magnitude) estimation, and vibrato extraction from \(f_(0)\) trajectory has been particularly studied. Several techniques are described. The performance of the system is discussed.

The vibrato is extracted from \(f_(0)\) trajectory to obtain a no-vibrato melodic evolution. This ``flat'' fundamental frequency is useful for segmentation of musical excerpts into notes (third scheme), and can also be used for sound modification or processing.

The vibrato detection is operated only when music is identified on the first scheme.

3) The third scheme leads to segmentation into ``notes or into phones or more generally into stable sounds'', according to the nature of the sound: instrumental part, singing voice excerpt, speech, percussive part...

The analysis is composed of four steps. The first step is to extract a large set of features. A feature will be all the more appropriate as its time evolution presents strong and short peaks when transitions occur, and as its variance and its mean remain at very low levels when describing a steady state part. Three kinds of transitions exist: \(f_(0)\) transients, energy transients and frequency content transients. Secondly, each of these features is automatically thresholded. Thirdly, a final decision function based on the set of the thresholded features has been built and provides the segmentation marks. Lastly, for monophonic and harmonic sounds, the automatic transcription is done. The performance of the system is discussed.

The data obtained in a given scheme are propagated from lower numbered to higher numbered schemes in order to improve their performance.
Complete list of metadatas

Cited literature [79 references]  Display  Hide  Download
Contributor : Stéphane Rossignol <>
Submitted on : Monday, October 24, 2005 - 3:53:26 PM
Last modification on : Saturday, January 9, 2021 - 5:34:41 PM
Long-term archiving on: : Friday, April 2, 2010 - 10:06:28 PM


  • HAL Id : tel-00010732, version 1


Stéphane Rossignol. Segmentation et indexation des signaux sonores musicaux. Traitement du signal et de l'image [eess.SP]. Université Pierre et Marie Curie - Paris VI, 2000. Français. ⟨tel-00010732⟩



Record views


Files downloads