Skip to Main content Skip to Navigation
Theses

Acquisition de grammaires lexicalisées pour les langues naturelles

Abstract : Grammatical inference consists in discovering the rules governing how sentences of a language are formed, that is a grammar of this language. In Gold's model of learning, examples given as input are only sentences belonging to the language. The algorithm must provide a grammar which represents the enumerated language. Categorial grammars are one of the numerous existing formalisms used to represent languages. Kanazawa has shown that some subclasses of these grammars are learnable, but his results do not apply directly to natural languages. In a theoretical viewpoint, we propose to generalize Kanazawa's results to different kinds of gram- mars. General combinatory grammars are a flexible model that permits to define grammatical systems based on rewriting rules. In this framework, we show that some classes of languages are learnable. In order to be maximally general, our results are expressed in the form of criteria on the grammati- cal system rules. These results are applied to several formalisms which are quite well suited for the representation of natural languages. We also address the problem of implementing learning algorithms with real data. Indeed, existing algorithms that are able to learn rich classes of languages are NP-complete. We propose a more flexible learning framework, called partial learning, to bypass this obstacle: the context in which learning takes place is modified, in order to obtain a more realistic algorithmic complexity. We test this approach with some average size data, and obtain quite encouraging results.
Document type :
Theses
Complete list of metadatas

Cited literature [114 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00487042
Contributor : Erwan Moreau <>
Submitted on : Thursday, May 27, 2010 - 5:28:37 PM
Last modification on : Tuesday, September 18, 2018 - 12:40:03 AM
Long-term archiving on: : Thursday, September 16, 2010 - 3:59:24 PM

Identifiers

  • HAL Id : tel-00487042, version 1

Collections

Citation

Erwan Moreau. Acquisition de grammaires lexicalisées pour les langues naturelles. Autre [cs.OH]. Université de Nantes, 2006. Français. ⟨tel-00487042⟩

Share

Metrics

Record views

232

Files downloads

603