Skip to Main content Skip to Navigation

Convolutional operators in the time-frequency domain

Abstract : This dissertation addresses audio classification by designing signal representations which satisfy appropriate invariants while preserving inter-class variability. First, we study time-frequencyscattering, a representation which extract modulations at various scales and rates in a similar way to idealized models of spectrotemporal receptive fields in auditory neuroscience. We report state-of-the-artresults in the classification of urban and environmental sounds, thus outperforming short-term audio descriptors and deep convolutional networks. Secondly, we introduce spiral scattering, a representationwhich combines wavelet convolutions along time, along log-frequency, and across octaves. Spiral scattering follows the geometry of the Shepard pitch spiral, which makes a full turn at every octave. We study voiced sounds with a nonstationary sourcefilter model where both the source and the filter are transposed through time, and show that spiral scattering disentangles and linearizes these transpositions. Furthermore, spiral scattering reaches state-of-the-art results in musical instrument classification ofsolo recordings. Aside from audio classification, time-frequency scattering and spiral scattering can be used as summary statistics for audio texture synthesis. We find that, unlike the previously existing temporal scattering transform, time-frequency scattering is able to capture the coherence ofspectrotemporal patterns, such as those arising in bioacoustics or speech, up to anintegration scale of about 500 ms. Based on this analysis-synthesis framework, an artisticcollaboration with composer Florian Hecker has led to the creation of five computer music
Document type :
Complete list of metadatas

Cited literature [246 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Monday, July 10, 2017 - 6:06:05 PM
Last modification on : Saturday, January 9, 2021 - 4:16:33 PM
Long-term archiving on: : Wednesday, January 24, 2018 - 5:16:45 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01559667, version 1



Vincent Lostanlen. Convolutional operators in the time-frequency domain. Signal and Image Processing. Université Paris sciences et lettres, 2017. English. ⟨NNT : 2017PSLEE012⟩. ⟨tel-01559667⟩



Record views


Files downloads