Skip to Main content Skip to Navigation
Theses

Explorations in Word Embeddings : graph-based word embedding learning and cross-lingual contextual word embedding learning

Abstract : Word embeddings are a standard component of modern natural language processing architectures. Every time there is a breakthrough in word embedding learning, the vast majority of natural language processing tasks, such as POS-tagging, named entity recognition (NER), question answering, natural language inference, can benefit from it. This work addresses the question of how to improve the quality of monolingual word embeddings learned by prediction-based models and how to map contextual word embeddings generated by pretrained language representation models like ELMo or BERT across different languages.For monolingual word embedding learning, I take into account global, corpus-level information and generate a different noise distribution for negative sampling in word2vec. In this purpose I pre-compute word co-occurrence statistics with corpus2graph, an open-source NLP-application-oriented Python package that I developed: it efficiently generates a word co-occurrence network from a large corpus, and applies to it network algorithms such as random walks. For cross-lingual contextual word embedding mapping, I link contextual word embeddings to word sense embeddings. The improved anchor generation algorithm that I propose also expands the scope of word embedding mapping algorithms from context independent to contextual word embeddings.
Complete list of metadatas

Cited literature [122 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02366013
Contributor : Abes Star :  Contact
Submitted on : Friday, November 15, 2019 - 4:14:52 PM
Last modification on : Wednesday, October 14, 2020 - 3:41:47 AM
Long-term archiving on: : Sunday, February 16, 2020 - 5:37:00 PM

File

82195_ZHANG_2019_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02366013, version 1

Citation

Zheng Zhang. Explorations in Word Embeddings : graph-based word embedding learning and cross-lingual contextual word embedding learning. Computation and Language [cs.CL]. Université Paris-Saclay, 2019. English. ⟨NNT : 2019SACLS369⟩. ⟨tel-02366013⟩

Share

Metrics

Record views

377

Files downloads

2029