Skip to Main content Skip to Navigation
Theses

Coreference resolution for spoken French

Abstract : A coreference chain is the set of linguistic expressions — or mentions — that refer to the same entity or discourse object in a given document. Coreference resolution consists in detecting all the mentions in a document and partitioning their set into coreference chains. Coreference chains play a central role in the consistency of documents and interactions, and their identification has applications to many other fields in natural language processing that rely on an understanding of language, such as information extraction, question answering or machine translation. Natural language processing systems that perform this task exist for many languages, but none for French — which suffered until recently from a lack of suitable annotated resources — and none for spoken language. In this thesis, we aim to fill this gap by designing a coreference resolution system for spoken French. To this end, we propose a knowledge-poor system based on an end-to-end neural network architecture, which obviates the need for the preprocessing pipelines common in existing systems, while maintaining performances comparable to the state-of-the art. We then propose extensions on that baseline, by augmenting our system with external knowledge obtained from resources and preprocessing tools designed for written French. Finally, we propose a new standard representation for coreference annotation in corpora of written and spoken languages, and demonstrate its use in a new version of ANCOR, the first coreference corpus of spoken French.
Complete list of metadata

https://hal.archives-ouvertes.fr/tel-02928209
Contributor : Loïc Grobol <>
Submitted on : Tuesday, March 9, 2021 - 3:35:45 PM
Last modification on : Thursday, March 11, 2021 - 3:26:19 AM

File

lgrobol-thesis.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

  • HAL Id : tel-02928209, version 2

Citation

Loïc Grobol. Coreference resolution for spoken French. Computation and Language [cs.CL]. Université Sorbonne Nouvelle - Paris 3, 2020. English. ⟨tel-02928209v2⟩

Share

Metrics

Record views

108

Files downloads

424