Skip to Main content Skip to Navigation
Theses

Distributional models of multiword expression compositionality prediction

Abstract : Natural language processing systems often rely on the idea that language is compositional, that is, the meaning of a linguistic entity can be inferred from the meaning of its parts. This expectation fails in the case of multiword expressions (MWEs). For example, a person who is a sitting duck is neither a duck nor necessarily sitting. Modern computational techniques for inferring word meaning based on the distribution of words in the text have been quite successful at multiple tasks, especially since the rise of word embedding approaches. However, the representation of MWEs still remains an open problem in the field. In particular, it is unclear how one could predict from corpora whether a given MWE should be treated as an indivisible unit (e.g. nut case) or as some combination of the meaning of its parts (e.g. engine room). This thesis proposes a framework of MWE compositionality prediction based on representations of distributional semantics, which we instantiate under a variety of parameters. We present a thorough evaluation of the impact of these parameters on three new datasets of MWE compositionality, encompassing English, French and Portuguese MWEs. Finally, we present an extrinsic evaluation of the predicted levels of MWE compositionality on the task of MWE identification. Our results suggest that the proper choice of distributional model and corpus parameters can produce compositionality predictions that are comparable to the state of the art.
Document type :
Theses
Complete list of metadatas

Cited literature [138 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01728528
Contributor : Silvio Ricardo Cordeiro <>
Submitted on : Sunday, March 11, 2018 - 10:00:18 AM
Last modification on : Saturday, March 24, 2018 - 1:22:57 AM
Long-term archiving on: : Tuesday, June 12, 2018 - 12:34:53 PM

File

SilvioCordeiro-thesis-final.pd...
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives 4.0 International License

Identifiers

  • HAL Id : tel-01728528, version 1

Collections

Citation

Silvio Ricardo Cordeiro. Distributional models of multiword expression compositionality prediction. Computation and Language [cs.CL]. Federal University of Rio Grande do Sul; Aix Marseille University, 2017. English. ⟨NNT : 2017AIXM0501⟩. ⟨tel-01728528⟩

Share

Metrics

Record views

185

Files downloads

788