Skip to Main content Skip to Navigation

Efficient speaker diarization and low-latency speaker spotting

Abstract : Speaker diarization (SD) involves the detection of speakers within an audio stream and the intervals during which each speaker is active, i.e. the determination of ‘who spoken when’. The first part of the work presented in this thesis exploits an approach to speaker modelling involving binary keys (BKs) as a solution to SD. BK modelling is efficient and operates without external training data, as it operates using test data alone. The presented contributions include the extraction of BKs based on multi-resolution spectral analysis, the explicit detection of speaker changes using BKs, as well as SD fusion techniques that combine the benefits of both BK and deep learning based solutions. The SD task is closely linked to that of speaker recognition or detection, which involves the comparison of two speech segments and the determination of whether or not they were uttered by the same speaker. Even if many practical applications require their combination, the two tasks are traditionally tackled independently from each other. The second part of this thesis considers an application where SD and speaker recognition solutions are brought together. The new task, coined low latency speaker spotting (LLSS), involves the rapid detection of known speakers within multi-speaker audio streams. It involves the re-thinking of online diarization and the manner by which diarization and detection sub-systems should best be combined.
Complete list of metadata

Cited literature [282 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, January 28, 2020 - 5:46:11 PM
Last modification on : Sunday, June 26, 2022 - 9:45:05 AM
Long-term archiving on: : Wednesday, April 29, 2020 - 4:46:42 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02458517, version 1


José María Patino Villar. Efficient speaker diarization and low-latency speaker spotting. Signal and Image Processing. Sorbonne Université, 2019. English. ⟨NNT : 2019SORUS003⟩. ⟨tel-02458517⟩



Record views


Files downloads