Convolutional operators in the time-frequency domain

Abstract : This dissertation addresses audio classification by designing signal representations which satisfy appropriate invariants while preserving inter-class variability. First, we study time-frequencyscattering, a representation which extract modulations at various scales and rates in a similar way to idealized models of spectrotemporal receptive fields in auditory neuroscience. We report state-of-the-artresults in the classification of urban and environmental sounds, thus outperforming short-term audio descriptors and deep convolutional networks. Secondly, we introduce spiral scattering, a representationwhich combines wavelet convolutions along time, along log-frequency, and across octaves. Spiral scattering follows the geometry of the Shepard pitch spiral, which makes a full turn at every octave. We study voiced sounds with a nonstationary sourcefilter model where both the source and the filter are transposed through time, and show that spiral scattering disentangles and linearizes these transpositions. Furthermore, spiral scattering reaches state-of-the-art results in musical instrument classification ofsolo recordings. Aside from audio classification, time-frequency scattering and spiral scattering can be used as summary statistics for audio texture synthesis. We find that, unlike the previously existing temporal scattering transform, time-frequency scattering is able to capture the coherence ofspectrotemporal patterns, such as those arising in bioacoustics or speech, up to anintegration scale of about 500 ms. Based on this analysis-synthesis framework, an artisticcollaboration with composer Florian Hecker has led to the creation of five computer music
Document type :
Theses
Complete list of metadatas

Cited literature [246 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01559667
Contributor : Abes Star <>
Submitted on : Monday, July 10, 2017 - 6:06:05 PM
Last modification on : Monday, December 24, 2018 - 5:08:07 PM
Long-term archiving on : Wednesday, January 24, 2018 - 5:16:45 PM

File

LOSTANLEN_2017_diffusion.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01559667, version 1

Citation

Vincent Lostanlen. Convolutional operators in the time-frequency domain. Signal and Image Processing. PSL Research University, 2017. English. ⟨NNT : 2017PSLEE012⟩. ⟨tel-01559667⟩

Share

Metrics

Record views

1123

Files downloads

1288