Skip to Main content Skip to Navigation

Characterization of audiovisual binding and fusion in the framework of audiovisual speech scene analysis

Abstract : The present doctoral work is focused on a tentative fusion between two separate concepts: Auditory Scene Analysis (ASA) and Audiovisual (AV) fusion in speech perception. We introduce “Audio Visual Speech Scene Analysis” (AVSSA) as an extension of the two-stage ASA model to- wards AV scenes, and we propose that a coherence index between the auditory and the visual input is computed prior to AV fusion, enabling to determine whether the sensory inputs should be bound together. This is the “two-stage model of AV fusion”. Previous experiments on the modulation of the McGurk effect by AV coherent vs. incoherent contexts presented before the McGurk target have provided experimental evidence supporting the two-stage model. In this doctoral work, we further evaluate the AVSSA process within the two-stage architecture in various dimensions such as introducing noise, considering multiple sources, assessing neurophysiological correlates and testing in different populations.A first set of experiments in younger adults was focused on behavioral characterization of the AV binding process by introducing noise and results showed that the participants were able to evaluate both the level of acoustic noise and AV coherence and to monitor the AV fusion accordingly. In a second set of behavioral experiments involving competing AV sources, we showed that the AVSSA process enables to evaluate the coherence between auditory and visual features within a complex scene, in order to properly associate the adequate components of a given AV speech source, and provide to the fusion process an assessment of the AV coherence of the extracted source. It also appears that the modulation of fusion depends on the attentional focus on one source or the other.Then an EEG experiment aimed to display a neurophysiological marker of the binding and un- binding process and showed that an incoherent AV context could modulate the effect of the visual input on the N1/P2 component. The last set of experiments were focused on measurement of AV binding and its dynamics in the older population, and provided similar results as in younger adults though with a higher amount of unbinding. The whole set of results enabled better characterize the AVSSA process and were embedded in the proposal of an improved neurocognitive architecture for AV fusion in speech perception.
Complete list of metadata

Cited literature [220 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Wednesday, January 24, 2018 - 3:26:07 PM
Last modification on : Friday, March 25, 2022 - 9:44:07 AM
Long-term archiving on: : Thursday, May 24, 2018 - 9:58:53 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01692029, version 1



Ganesh Attigodu Chandrashekara. Characterization of audiovisual binding and fusion in the framework of audiovisual speech scene analysis. Psychology. Université Grenoble Alpes, 2016. English. ⟨NNT : 2016GREAS006⟩. ⟨tel-01692029⟩



Record views


Files downloads