Skip to Main content Skip to Navigation

Analysis and synthesis of urban sound scenes using deep learning techniques

Abstract : The advent of the Internet of Things (IoT) has enabled the development of largescale acoustic sensor networks to continuously monitor sound environments in urban areas. In the soundscape approach, perceptual quality attributes are associated with the activity of sound sources, quantities of importance to better account for the human perception of its acoustic environment. With recent success in acoustic scene analysis, deep learning approaches are uniquely suited to predict these quantities. Though, annotations necessary to the training process of supervised deep learning models are not easily obtainable, partly due to the fact that the information content of sensor measurements is limited by privacy constraints. To address this issue, a method is proposed for the automatic annotation of perceived source activity in large datasets of simulated acoustic scenes. On simulated data, trained deep learning models achieve state-of-the-art performances in the estimation of sourcespecific perceptual attributes and sound pleasantness. Semi-supervised transfer learning techniques are further studied to improve the adaptability of trained models by exploiting knowledge from the large amounts of unlabelled sensor data. Evaluations on annotated in situ recordings show that learning latent audio representations of sensor measurements compensates for the limited ecological validity of simulated sound scenes. In a second part, the use of deep learning methods for the synthesis of time domain signals from privacy-aware sensor measurements is investigated. Two spectral convolutional approaches are developed and evaluated against state-of-the-art methods designed for speech synthesis.
Complete list of metadata
Contributor : Abes Star :  Contact
Submitted on : Wednesday, March 24, 2021 - 10:27:08 AM
Last modification on : Wednesday, April 27, 2022 - 3:51:10 AM


Version validated by the jury (STAR)


  • HAL Id : tel-03179093, version 1


Félix Gontier. Analysis and synthesis of urban sound scenes using deep learning techniques. Acoustics [physics.class-ph]. École centrale de Nantes, 2020. English. ⟨NNT : 2020ECDN0042⟩. ⟨tel-03179093⟩



Record views


Files downloads