Multichannel source counting with CRNN : analysis of the performance

Pierre-Amaury Grumiaux; Srdan Kitic; Laurent Girin; Alexandre Guérin

doi:10.48465/fa.2020.0766

Conference Papers Year : 2020

Multichannel source counting with CRNN : analysis of the performance

(1) , (2, 1) , (3) , (1)

1
2
3

Pierre-Amaury Grumiaux

Function : Author
PersonId : 737841
IdHAL : pierre-amaury-grumiaux
ORCID : 0000-0001-5263-787X
IdRef : 253134757

Orange Labs

Srdan Kitic

Function : Author
PersonId : 13211
IdHAL : srdan-kitic

Parcimonie et Nouveaux Algorithmes pour le Signal et la Modélisation Audio

Orange Labs

Laurent Girin

Function : Author
PersonId : 3682
IdHAL : laurent-girin
ORCID : 0000-0002-9214-8760
IdRef : 088998037

GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing

Alexandre Guérin

Function : Author

Orange Labs

Abstract

In this work we focus on the problem of estimating the number of concurrent speaker in an audio recording. This information is often a prerequisite in several audio processing tasks such as speaker separation, localization and tracking. In a previous work, we proposed to tackle this problem by using a convolutional recuurrent neural network (CRNN) with first-order Ambisonics input features. The network was trained to predict up to 5 concurrent speaker with a simulated dataset which includes many different conditions in terms of source and microphone positions, reverberation and noise. In this work, we analyze the performance of the neural network along the frames of an input signal. We show that there is an optimal analysis frame within the sequence for which the performance is better, and that it depends on some hyperparameters of the network, such as number of convolutional layers, convolutional kernel sizes, or number of timesteps in the recurrent part. This provides a good insight into the behavior of CRNN on audio signals for this specific task.

Keywords

source counting convolutional recurrent neural network first-order Ambisonics spatial room impulse response

Domains

Acoustics [physics.class-ph] Vibrations [physics.class-ph]

Fichier principal

000766.pdf (456.02 Ko)

Origin : Publisher files allowed on an open archive

Claude Inserra : Connect in order to contact the contributor

https://hal.science/hal-03235360

Submitted on : Thursday, May 27, 2021-8:58:34 AM

Last modification on : Thursday, April 4, 2024-9:11:48 PM

Long-term archiving on: Saturday, August 28, 2021-6:12:41 PM

Dates and versions

hal-03235360 , version 1 (27-05-2021)

Identifiers

HAL Id : hal-03235360 , version 1
DOI : 10.48465/fa.2020.0766

Cite

Pierre-Amaury Grumiaux, Srdan Kitic, Laurent Girin, Alexandre Guérin. Multichannel source counting with CRNN : analysis of the performance. Forum Acusticum 2020, Dec 2020, Lyon (virtual), France. pp.829-835, ⟨10.48465/fa.2020.0766⟩. ⟨hal-03235360⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-RENNES1 UGA CNRS INRIA INSA-RENNES IRISA GIPSA GIPSA-CRISSP CENTRALESUPELEC IRISA-D5 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES FA2020 GIPSA-PPC UR1-MATH-NUM

144 View

72 Download

Multichannel source counting with CRNN : analysis of the performance

Abstract

Keywords

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Altmetric

Share