Skip to Main content Skip to Navigation

Distributional models of multiword expression compositionality prediction

Abstract : Natural language processing systems often rely on the idea that language is compositional, that is, the meaning of a linguistic entity can be inferred from the meaning of its parts. This expectation fails in the case of multiword expressions (MWEs). For example, a person who is a sitting duck is neither a duck nor necessarily sitting. Modern computational techniques for inferring word meaning based on the distribution of words in the text have been quite successful at multiple tasks, especially since the rise of word embedding approaches. However, the representation of MWEs still remains an open problem in the field. In particular, it is unclear how one could predict from corpora whether a given MWE should be treated as an indivisible unit (e.g. nut case) or as some combination of the meaning of its parts (e.g. engine room). This thesis proposes a framework of MWE compositionality prediction based on representations of distributional semantics, which we instantiate under a variety of parameters. We present a thorough evaluation of the impact of these parameters on three new datasets of MWE compositionality, encompassing English, French and Portuguese MWEs. Finally, we present an extrinsic evaluation of the predicted levels of MWE compositionality on the task of MWE identification. Our results suggest that the proper choice of distributional model and corpus parameters can produce compositionality predictions that are comparable to the state of the art.
Document type :
Complete list of metadatas

Cited literature [138 references]  Display  Hide  Download
Contributor : Silvio Ricardo Cordeiro <>
Submitted on : Sunday, March 11, 2018 - 10:00:18 AM
Last modification on : Saturday, March 24, 2018 - 1:22:57 AM
Long-term archiving on: : Tuesday, June 12, 2018 - 12:34:53 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives 4.0 International License


  • HAL Id : tel-01728528, version 1



Silvio Ricardo Cordeiro. Distributional models of multiword expression compositionality prediction. Computation and Language [cs.CL]. Federal University of Rio Grande do Sul; Aix Marseille University, 2017. English. ⟨NNT : 2017AIXM0501⟩. ⟨tel-01728528⟩



Record views


Files downloads