Multichannel source counting with CRNN : analysis of the performance - CRISSP Access content directly
Conference Papers Year : 2020

Multichannel source counting with CRNN : analysis of the performance

Abstract

In this work we focus on the problem of estimating the number of concurrent speaker in an audio recording. This information is often a prerequisite in several audio processing tasks such as speaker separation, localization and tracking. In a previous work, we proposed to tackle this problem by using a convolutional recuurrent neural network (CRNN) with first-order Ambisonics input features. The network was trained to predict up to 5 concurrent speaker with a simulated dataset which includes many different conditions in terms of source and microphone positions, reverberation and noise. In this work, we analyze the performance of the neural network along the frames of an input signal. We show that there is an optimal analysis frame within the sequence for which the performance is better, and that it depends on some hyperparameters of the network, such as number of convolutional layers, convolutional kernel sizes, or number of timesteps in the recurrent part. This provides a good insight into the behavior of CRNN on audio signals for this specific task.
Fichier principal
Vignette du fichier
000766.pdf (456.02 Ko) Télécharger le fichier
Origin : Publisher files allowed on an open archive

Dates and versions

hal-03235360 , version 1 (27-05-2021)

Identifiers

Cite

Pierre-Amaury Grumiaux, Srdan Kitic, Laurent Girin, Alexandre Guérin. Multichannel source counting with CRNN : analysis of the performance. Forum Acusticum 2020, Dec 2020, Lyon (virtual), France. pp.829-835, ⟨10.48465/fa.2020.0766⟩. ⟨hal-03235360⟩
144 View
72 Download

Altmetric

Share

Gmail Facebook X LinkedIn More