Computational models of disfluencies : fillers and discourse markers in spoken language understanding

Abstract : People rarely speak in the same manner that they write – they are generally disfluent. Disfluencies can be defined as interruptions in the regular flow of speech, such as pausing silently, repeating words, or interrupting oneself to correct something said previously. Despite being a natural characteristic of spontaneous speech, and the rich linguistic literature that discusses their informativeness, they are often removed as noise in post-processing from the output transcripts of speech recognisers. So far, their consideration in a Spoken Language Understanding (SLU) context has been rarely explored. The aim of this thesis is to develop computational models of disfluencies in SLU. To do so, we take inspiration from psycholinguistic models of disfluencies, which focus on the role that disfluencies play in the production (by the speaker) and comprehension (by the listener) of speech. Specifically, when we use the term computational models of disfluencies'', we mean to develop methodologies that automatically process disfluencies to empirically observe 1) their impact on the production and comprehension of speech, and 2) how they interact with the primary signal (the lexical, or what was said in essence). To do so, we focus on two discourse contexts; monologues and task-oriented dialogues.Our results contribute to broader tasks in SLU, and also research relevant to Spoken Dialogue Systems. When studying monologues, we use a combination of traditional and neural models to study the representations and impact of disfluencies on SLU performance. Additionally, we develop methodologies to study disfluencies as a cue for incoming information in the flow of the discourse. In studying task-oriented dialogues, we focus on developing computational models to study the roles of disfluencies in the listener-speaker dynamic. We specifically study disfluencies in the context of verbal alignment; i.e. the alignment of the interlocutors' lexical expressions, and the role of disfluencies in behavioural alignment; a new alignment context that we propose to mean when instructions given by one interlocutor are followed with an action by another interlocutor. We also consider how these disfluencies in local alignment contexts can be associated with discourse level phenomena; such as success in the task. We consider this thesis one of the many first steps that could be undertaken to integrate disfluencies in SLU contexts.
Keywords :
Document type :
Theses
Domain :

https://tel.archives-ouvertes.fr/tel-03653211
Contributor : ABES STAR :  Contact
Submitted on : Wednesday, April 27, 2022 - 3:01:10 PM
Last modification on : Thursday, April 28, 2022 - 3:08:33 AM
Long-term archiving on: : Friday, July 29, 2022 - 9:14:32 AM

File

101655_DINKAR_2022_archivage.p...
Version validated by the jury (STAR)

Identifiers

• HAL Id : tel-03653211, version 1

Citation

Tanvi Dinkar. Computational models of disfluencies : fillers and discourse markers in spoken language understanding. Computer science. Institut Polytechnique de Paris, 2022. English. ⟨NNT : 2022IPPAT001⟩. ⟨tel-03653211⟩

Record views