Skip to Main content Skip to Navigation
Theses

Un treebank pour le serbe : constitution et exploitations

Abstract : At the beginning of this PhD, no treebank for Serbian was available. However, manually annotated treebanks are an essential resource for developing (training and evaluating) statistical tools for syntactic analysis (parsers). Efficient parsers, in turn, facilitate the annotation of large corpora, which can be used as a basis for research in theoretical linguistics. The lack of these resources for Serbian slows down the research in these two directions. It also hinders the creation of digital resources for Serbian in general. In order to address this issue, we created a suite of NLP resources for Serbian. Firstly, we created the ParCoTrain-Synt treebank, a 101 000 token corpus, complete with morphosyntactic annotation, lemmatisation and syntactic dependency annotation. We also built the ParCoLex lexicon, containing 7 million entries for 157 000 different lemmas. Using these two resources, we trained models for parsing, morphosyntactic tagging and lemmatisation. All of the above resources are available at the following address : https: //github.com/aleksandra-miletic/serbian-nlp-resources. We also used these resources in two experiments in Serbian linguistics, demonstrating that the ParCoTrain-Synt treebank is well suited to empirical studies based on quantitative data analysis.
Keywords : Treebank Serbian Parsing
Document type :
Theses
Complete list of metadatas

Cited literature [336 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02639473
Contributor : Abes Star :  Contact
Submitted on : Thursday, May 28, 2020 - 12:10:09 PM
Last modification on : Wednesday, October 14, 2020 - 4:11:00 AM

Identifiers

  • HAL Id : tel-02639473, version 1

Collections

Citation

Aleksandra Miletic. Un treebank pour le serbe : constitution et exploitations. Linguistique. Université Toulouse le Mirail - Toulouse II, 2018. Français. ⟨NNT : 2018TOU20030⟩. ⟨tel-02639473⟩

Share

Metrics

Record views

59

Files downloads

95