Skip to Main content Skip to Navigation

Semantic and Discursive Representation for Natural Language Understanding

Damien Sileo 1
1 IRIT-MELODI - MEthodes et ingénierie des Langues, des Ontologies et du DIscours
IRIT - Institut de recherche en informatique de Toulouse
Abstract : Computational models for automatic text understanding have gained a lot of interest due to unusual performance gains over the last few years, some of them leading to super-human scores. This success reignited some grandeur claims about artificial intelligence, such as universal sentence representation. In this thesis, we question these claims through two complementary angles. Firstly, are neural networks and vector representations expressive enough to process text and perform a wide array of complex tasks? In this thesis, we will present currently used computational neural models and their training techniques. We propose a criterion for expressive compositions and show that a popular evaluation suite and sentence encoders (SentEval/InferSent) have an expressivity bottleneck; minor changes can yield new compositions that are expressive and insightful, but might not be sufficient, which may justify the paradigm shift towards newer Transformers-based models. Secondly, we will discuss the question of universality in sentence representation: what actually lies behind these universality claims? We delineate a few theories of meaning, and in a subsequent part of this thesis, we argue that semantics (unsituated, literal content) as opposed to pragmatics (meaning as use) is preponderant in the current training and evaluation data of natural language understanding models. To alleviate that problem, we show that discourse marker prediction (classification of hidden discourse markers between sentences) can be seen as a pragmatics-centered training signal for text understanding. We build a new discourse marker prediction dataset that yields significantly better results than previous work. In addition, we propose a new discourse-based evaluation suite that could incentivize researchers to take into account pragmatic considerations when evaluating text understanding models.
Document type :
Complete list of metadata

Cited literature [218 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Monday, May 25, 2020 - 8:30:20 PM
Last modification on : Wednesday, November 3, 2021 - 6:50:32 AM


Version validated by the jury (STAR)


  • HAL Id : tel-02619733, version 1


Damien Sileo. Semantic and Discursive Representation for Natural Language Understanding. Computation and Language [cs.CL]. Université Paul Sabatier - Toulouse III, 2019. English. ⟨NNT : 2019TOU30201⟩. ⟨tel-02619733⟩



Les métriques sont temporairement indisponibles