Joint Estimation of Musical Content Information From an Audio Signal

Abstract : This thesis is concerned with the problem of automatically extracting meaningful content information from music audio signals. Most of the previous works that address the problem of estimating musical attributes from the audio signal have dealt with these elements independently. However, musical elements are deeply related to each other and should be analyzed considering the global musical context, as a musician does when he or she analyzes a piece of music. Our research concentrates on three musical descriptors related to the harmonic, the metrical and the tonal structure. More specifically, we focus on three musical attributes: the chord progression, the downbeats and the musical key. The scope of this work is to develop a model that allows the joint estimation of the chords, the keys and the downbeats from polyphonic music recordings. We intend to show that integrating knowledge of mutual dependencies between several descriptors of musical content improves their estimation. In our model, harmony is a core around which other musical attributes are organized. We start by investigating several typical representations of the audio signal in order to select the most appropriate one for the task of harmonic content analysis. We explore several schemes for chromagram computation and investigate several issues related to the use of each representation. We detail and explain the choice of the audio signal representation we use as an input to our model. We then concentrates on the problem of the automatic estimation of the chord progression, using chroma features as observation of the music signal. From the audio signal, a set of chroma vectors representing the pitch content of the file over time is extracted. The chord progression is then estimated from these observations using a hidden Markov model. Several methods are proposed that allow taking into account music theory, perception of key and presence of higher harmonics of pitch notes. They are evaluated and compared to existing algorithms through a large-scale evaluation on popular music songs. We then present a new technique for estimating simultaneously the chord progression and the downbeats from an audio file. A specific topology of hidden Markov models that enables modeling chord dependency on the metrical structure is proposed. This model allows us to consider pieces with complex metrical structures such as beat insertion, beat deletion or changes in the meter. The model is evaluated on a large set of popular music songs that present various metrical structures. We compare a semi-automatic model, in which the beat positions are annotated, with a fully automatic model in which a beat tracker is used as a front-end of the system. Finally, we focus on the problem of key estimation. In a first part, we concentrate on the problem of estimating the main key of a piece. Relying on previous works on key estimation, we extend the above-mentioned model to a model for simultaneous downbeat, chord and key estimation from an audio signal. The model is evaluated on a set of popular music pieces. We then draw our attention to local key finding. We propose to address this problem by investigating the possible combination and extension of different previous proposed global key estimation approaches. The specificity of our approach is that we introduce key dependency on both the harmonic and the metrical structures. We evaluate and analyze the results of our model on a new annotated database composed of classical music pieces.
Document type :
Theses
Complete list of metadatas

Cited literature [139 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00548952
Contributor : I Papadopoulos <>
Submitted on : Tuesday, December 21, 2010 - 7:58:57 AM
Last modification on : Thursday, February 7, 2019 - 1:33:04 AM
Long-term archiving on : Tuesday, March 22, 2011 - 2:41:16 AM

Identifiers

  • HAL Id : tel-00548952, version 1

Citation

Hélène Papadopoulos. Joint Estimation of Musical Content Information From an Audio Signal. Computer Science [cs]. Université Pierre et Marie Curie - Paris VI, 2010. English. ⟨tel-00548952⟩

Share

Metrics

Record views

304

Files downloads

403