Skip to Main content Skip to Navigation
Theses

Etude et réalisation d'un système d'extraction de connaissances à partir de textes

Hacène Cherfi 1
1 ORPAILLEUR - Knowledge representation, reasonning
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : The present PhD dissertation relates to the problems of text mining (TM), or knowledge extraction from texts. It is applied to the text analysis, the datamining process itself, and the interpretation of the elements of knowledge extracted. Within this framework, a system of knowledge extraction which is necessary to analyse the texts according to their contents is studied and established. The methods of datamining applied are the frequent itemset levelwise search (with the "Close'' algorithm) and the association rule extraction. The manuscript emphasises on the definition of the process of text mining and its main characteristics within the framework of the frequent itemset and association rule extraction. Moreover, a detailed study of a number of quality measures attached to the rules is carried out in the context of text mining. It is shown how far these quality measures can help the quality interpretation of the extracted rules; how they can influence the global quality of the text mining process. The use of a knowledge model comes to support this thesis work. It is shown, by the definition of a maximum likelihood probability measure, the significance to discover new knowledge by discarding the knowledge already present and described in the model of the domain. The association rules can, therefore, being used to enrich a terminological knowledge model of the selected domain. This PhD dissertation includes an experimentation and a validation on a real-world text corpus holding on molecular biology domain.
Document type :
Theses
Complete list of metadata

Cited literature [98 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00011195
Contributor : Hacène Cherfi <>
Submitted on : Tuesday, December 13, 2005 - 12:31:18 PM
Last modification on : Friday, February 26, 2021 - 3:28:05 PM
Long-term archiving on: : Saturday, April 3, 2010 - 6:58:23 PM

Identifiers

  • HAL Id : tel-00011195, version 1

Collections

Citation

Hacène Cherfi. Etude et réalisation d'un système d'extraction de connaissances à partir de textes. Interface homme-machine [cs.HC]. Université Henri Poincaré - Nancy I, 2004. Français. ⟨tel-00011195⟩

Share

Metrics

Record views

756

Files downloads

4003