Graphes linguistiques multiniveau pour l'extraction de connaissances : l'exemple des collocations

Abstract : In order to model at best linguistic phenomena, natural language processing systems need to have quality ressources at their disposal, yet existing ressources are most often incomplete and do not allow to treat data in an adequate manner in process like translation, analysis, etc. This thesis is about acquisition of linguistic knowledge, and more precisely about the extraction of that knowledge from corpora where it appears. We study especially the problem of the collocations, these couple of terms where one term is chosen in function of the other one to express a particular meaning (as " driving rain ", where " driving " is used to express the intensification). To allow large-scale data acquisition, it is necessary to make it easy to realize in an automatic manner, and simple to configure by linguists with limited knowledge in computer programming. For that reason, we have to rely on a precise and suitable model for data and process. We describe MuLLinG, the multilevel linguistic graph we realized, where each level represents information in a different manner, and operations for the manipulation of these graphs. That model, based on a simple structure (the graph one), allows to represent, treat, and manage diverse kinds of ressources. Indeed, associated operations were written in order to be as most generic as possible, which means that they are independent of what nodes and edges represents, and of the task to realize. That enables our model, which has been implemented and used for several experiments, some concerning collocation extraction, to view a process (sometimes complex) of linguistic knowledge extraction, as a succession of small graph manipulation operations.
Vincent Archer. Graphes linguistiques multiniveau pour l'extraction de connaissances : l'exemple des collocations. Informatique [cs]. Université Joseph-Fourier - Grenoble I, 2009. Français. ⟨tel-00426517⟩



