Skip to Main content Skip to Navigation

Apprentissage profond faiblement supervisé et semi-supervisé pour la détection d'évènements sonores

Léo Cances 1 
Abstract : The amount of information produced by media such as Youtube, Facebook, or Instagram is a gold mine of information for machine and deep learning algorithms. A gold mine that cannot be reached until this information has been refined. For supervised algorithms, it is necessary to associate a label to each available piece of information allowing to identify and use it. This is a tedious, slow, and costly task, performed by human annotators on a voluntary or professional basis. However, the amount of information generated each day far exceeds our human annotation capabilities. It is then necessary to turn to learning methods capable of using the information in its raw or slightly processed form. For that, we will focus on weak annotations in the first part, then on partial annotations in the second part. The detection of sound events in a polyphonic environment is a difficult problem to solve. The sound events overlap, repeat or vary in the frequency domain. All these difficulties make the annotation task even more challenging, not only for a human annotator but also for systems trained in simple classification (mono phone). Semi-supervised audio classification, i.e. when a significant part of the dataset has not been annotated, is another proposed solution to the problem of the huge amount of data generated every day. Semi-supervised deep learning methods are numerous and use different mechanisms to implicitly extract information from these unannotated data, making them useful and directly usable. The objectives of this thesis are two folds. Firstly, to study and propose weakly supervised approaches for the sound event detection task in our participation in the DCASE international challenge task four, which provides realistic weakly supervised audio recordings extracted from domestic scenes. To solve this task, we suggest two solutions based on recurrent neural networks and statistical assumptions constraining the training. Secondly, we focus on semi-supervised deep learning when most of the information is not annotated. We compare approaches developed for image classification before proposing their application to audio classification and a substantial improvement. We show that the most recent approaches can achieve results as good as fully supervised training, which would have had access to all annotations.
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, May 31, 2022 - 3:13:11 PM
Last modification on : Monday, July 4, 2022 - 9:13:18 AM


Version validated by the jury (STAR)


  • HAL Id : tel-03683219, version 1


Léo Cances. Apprentissage profond faiblement supervisé et semi-supervisé pour la détection d'évènements sonores. Sciences de l'information et de la communication. Université Paul Sabatier - Toulouse III, 2021. Français. ⟨NNT : 2021TOU30262⟩. ⟨tel-03683219⟩



Record views


Files downloads