Skip to Main content Skip to Navigation

Classification audio sous contrainte de faible latence

Abstract : This thesis focuses on audio classification under low-latency constraints. Audio classification has been widely studied for the past few years, however, a large majority of the existing work presents classification systems that are not subject to temporal constraints : the audio signal can be scanned freely in order to gather the needed information to perform the decision (in that case, we may refer to an offline classification). Here, we consider audio classification in the telecommunication domain. The working conditions are now more severe : algorithms work in real time and the analysis and processing steps are now operated on the fly, as long as the signal is transmitted. Hence, the audio classification step has to meet the real time constraints, which can modify its behaviour in different ways : only the current and the past observations of the signal are available, and, despite this fact the classification system has to remain reliable and reactive. Thus, the first question that occurs is : what strategy for the classification can we adopt in order to tackle the real time constraints ? In the literature, we can find two main approaches : the frame-level classification and the segment-level classification. In the frame-level classification, the decision is performed using only the information extracted from the current audio frame. In the segment-level classification, we exploit a short-term information using data computed from the current and few past frames. The data fusion here is obtained using the process of temporal feature integration which consists of deriving relevant information based on the temporal evolution of the audio features. Based on that, there are several questions that need to be answered. What are the limits of these two classification framework ? Can an frame-level classification and a segment-level be used efficiently for any classification task ? Is it possible to obtain good performance with these approaches ? Which classification framework may lead to the best trade-off between accuracy and reactivity ? Furthermore, for the segment-level classification framework, the temporal feature integration process is mainly based on statistical models, but would it be possible to propose other methods ? Throughout this thesis, we investigate this subject by working on several concrete case studies. First, we contribute to the development of a novel audio algorithm dedicated to audio protection. The purpose of this algorithm is to detect and suppress very quickly potentially dangerous sounds for the listener. Our method, which relies on the proposition of three features, shows high detection rate and low false alarm rate in many use cases. Then, we focus on the temporal feature integration in a low-latency framework. To that end, we propose and evaluate several methodologies for the use temporal integration that lead to a good compromise between performance and reactivity. Finally, we propose a novel approach that exploits the temporal evolution of the features. This approach is based on the use of symbolic representation that can capture the temporal structure of the features. The idea is thus to find temporal patterns that are specific to each audio classes. The experiments performed with this approach show promising results.
Complete list of metadata

Cited literature [205 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Thursday, November 10, 2016 - 5:18:10 PM
Last modification on : Wednesday, November 3, 2021 - 6:05:25 AM
Long-term archiving on: : Monday, March 20, 2017 - 11:01:47 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01395495, version 1


Joachim Flocon-Cholet. Classification audio sous contrainte de faible latence. Traitement du signal et de l'image [eess.SP]. Université Rennes 1, 2016. Français. ⟨NNT : 2016REN1S030⟩. ⟨tel-01395495⟩



Les métriques sont temporairement indisponibles