Efficient large-context dependency parsing and correction with distributional lexical resources

Abstract : This thesis explores ways to improve the accuracy and coverage of efficient statistical dependency parsing. We employ transition-based parsing with models learned using Support Vector Machines (Cortes and Vapnik, 1995), and our experiments are carried out on French. Transition-based parsing is very fast due to the computational efficiency of its underlying algorithms, which are based on a local optimization of attachment decisions. Our first research thread is thus to increase the syntactic context used. From the arc-eager transition system (Nivre, 2008) we propose a variant that simultaneously considers multiple candidate governors for right-directed attachments. We also test parse correction, inspired by Hall and Novák (2005), which revises each attachment in a parse by considering multiple alternative governors in the local syntactic neighborhood. We find that multiple-candidate approaches slightly improve parsing accuracy overall as well as for prepositional phrase attachment and coordination, two linguistic phenomena that exhibit high syntactic ambiguity. Our second research thread explores semi-supervised approaches for improving parsing accuracy and coverage. We test self-training within the journalistic domain as well as for adaptation to the medical domain, using a two-stage parsing approach based on that of McClosky et al. (2006). We then turn to lexical modeling over a large corpus: we model generalized lexical classes to reduce data sparseness, and prepositional phrase attachment preference to improve disambiguation. We find that semi-supervised approaches can sometimes improve parsing accuracy and coverage, without increasing time complexity.
Document type :
Theses
Complete list of metadatas

Cited literature [165 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00860720
Contributor : Enrique Henestroza Anguiano <>
Submitted on : Tuesday, September 10, 2013 - 10:16:11 PM
Last modification on : Friday, January 4, 2019 - 5:33:24 PM
Long-term archiving on : Thursday, April 6, 2017 - 5:29:29 PM

Identifiers

  • HAL Id : tel-00860720, version 1

Collections

Citation

Enrique Henestroza Anguiano. Efficient large-context dependency parsing and correction with distributional lexical resources. Document and Text Processing. Université Paris-Diderot - Paris VII, 2013. English. ⟨tel-00860720⟩

Share

Metrics

Record views

359

Files downloads

431