Skip to Main content Skip to Navigation
Theses

Polysemy resolution with word embedding models and data visualization : the case of adverbial postpositions -ey, -eyse, and -(u)lo in Korean

Abstract : This dissertation reports computational accounts of resolving word-level polysemy in a lesser-studied language—Korean. Postpositions, which are characterized as multiple form-function mapping and thus polysemous in nature, pose a challenge to automatic analysis and model performance in identifying their functions. In this project, I enhance the existing word-level embedding classification models (Positive Pointwise Mutual Information and Singular Value Decomposition; Skip-Gram and Negative Sampling) with the consideration of context window, and introduce a sentence-level embedding classification model (Bidirectional Encoder Representations from Transformers (BERT)) under the scheme of Distributional Semantic Modeling. I then develop two visualization systems that show (i) relationships of the postpositions and their co- occurring words for word-level embedding models, and (ii) clusters between sentences for the sentence-level embedding model. These visualization systems have an advantage to better understand how these classification models classify the intended functions of these postpositions. Results show that, whereas the performance of the word-level embedding models is modulated by the size of training corpora containing specific functions of the postpositions, the sentence-level embedding model performs in a stable way (i.e., less affected by the corpus size) and simulates how humans recognize the polysemy involving Korean adverbial postpositions more appropriately than the word-level embedding models do.
Document type :
Theses
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03508420
Contributor : ABES STAR :  Contact
Submitted on : Monday, January 3, 2022 - 4:25:14 PM
Last modification on : Wednesday, May 11, 2022 - 4:39:21 AM
Long-term archiving on: : Monday, April 4, 2022 - 8:44:30 PM

File

2021PA100077.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03508420, version 1

Citation

Seongmin Mun. Polysemy resolution with word embedding models and data visualization : the case of adverbial postpositions -ey, -eyse, and -(u)lo in Korean. Linguistics. Université de Nanterre - Paris X, 2021. English. ⟨NNT : 2021PA100077⟩. ⟨tel-03508420⟩

Share

Metrics

Record views

48

Files downloads

2