Skip to Main content Skip to Navigation
Theses

Elaboration d'un composant syntaxique à base de grammaires d'arbres adjoints pour le vietnamien

Phuong Le-Hong 1
1 KIWI - Knowledge Information and Web Intelligence
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This thesis deals with the construction of linguistic resources and tools for the automatic processing of the Vietnamese language. The central research topic of the thesis is the development of a syntactic component including a broad-coverage grammar and a deep syntactic parser for this language. We have developed a modular and customizable chain aimed to apply to raw texts a cascade of surface processing steps including automatic sentence detection, word segmentation and part-of-speech tagging. Necessarily preliminary steps before parsing, they can be also used to prepare other tasks. The Vietnamese grammar is modeled using the Lexicalized Tree Adjoining Grammar (LTAG) formalism. We have developed a system which extracts automatically a grammar LTAG from a treebank for Vietnamese. The tree templates of this grammar cover the most frequent syntactic structures of the Vietnamese language. We have implemented a deep syntactic parser for Vietnamese which is able to give both constituency and dependency analysis of a sentence. We describe theoretical foundations of the system and its modules, their quantitative evaluations. Our system has good performances on related tasks, some modules have the best result ever published for the Vietnamese language.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00529657
Contributor : Phuong Le-Hong <>
Submitted on : Tuesday, October 26, 2010 - 11:38:36 AM
Last modification on : Tuesday, April 24, 2018 - 1:54:45 PM
Long-term archiving on: : Thursday, January 27, 2011 - 2:51:09 AM

Identifiers

  • HAL Id : tel-00529657, version 1

Collections

Citation

Phuong Le-Hong. Elaboration d'un composant syntaxique à base de grammaires d'arbres adjoints pour le vietnamien. Interface homme-machine [cs.HC]. Université Nancy II, 2010. Français. ⟨tel-00529657⟩

Share

Metrics

Record views

602

Files downloads

2108