Skip to Main content Skip to Navigation

Weak supervision for learning discourse structure in multi-party dialogues

Sonia Badene 1 
1 IRIT-MELODI - MEthodes et ingénierie des Langues, des Ontologies et du DIscours
IRIT - Institut de recherche en informatique de Toulouse
Abstract : The main objective of this thesis is to improve the automatic capture of semantic information with the goal of modeling and understanding human communication. We have advanced the state of the art in discourse parsing, in particular in the retrieval of discourse structure from chat, in order to implement, at the industrial level, tools to help explore conversations. These include the production of automatic summaries, recommendations, dialogue acts detection, identification of decisions, planning and semantic relations between dialogue acts in order to understand dialogues. In multi-party conversations it is important to not only understand the meaning of a participant's utterance and to whom it is addressed, but also the semantic relations that tie it to other utterances in the conversation and give rise to different conversation threads. An answer must be recognized as an answer to a particular question; an argument, as an argument for or against a proposal under discussion; a disagreement, as the expression of a point of view contrasted with another idea already expressed. Unfortunately, capturing such information using traditional supervised machine learning methods from quality hand-annotated discourse data is costly and time-consuming, and we do not have nearly enough data to train these machine learning models, much less deep learning models. Another problem is that arguably, no amount of data will be sufficient for machine learning models to learn the semantic characteristics of discourse relations without some expert guidance; the data are simply too sparse. Long distance relations, in which an utterance is semantically connected not to the immediately preceding utterance, but to another utterance from further back in the conversation, are particularly difficult and rare, though often central to comprehension. It is therefore necessary to find a more efficient way to retrieve discourse structures from large corpora of multi-party conversations, such as meeting transcripts or chats. This is one goal this thesis achieves. In addition, we not only wanted to design a model that predicts discourse structure for multi-party conversation without requiring large amounts of hand-annotated data, but also to develop an approach that is transparent and explainable so that it can be modified and improved by experts. The method detailed in this thesis achieves this goal as well.
Document type :
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, March 29, 2022 - 10:57:07 AM
Last modification on : Monday, July 4, 2022 - 8:53:56 AM
Long-term archiving on: : Thursday, June 30, 2022 - 6:50:54 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03622653, version 1


Sonia Badene. Weak supervision for learning discourse structure in multi-party dialogues. Artificial Intelligence [cs.AI]. Université Paul Sabatier - Toulouse III, 2021. English. ⟨NNT : 2021TOU30138⟩. ⟨tel-03622653⟩



Record views


Files downloads