Skip to Main content Skip to Navigation
Theses

Modélisation ontologique pour la recherche d'information : évaluation de la similarité sémantique de textes et application à la détection de plagiats

Samia Iltache 1
1 IRIT-MELODI - MEthodes et ingénierie des Langues, des Ontologies et du DIscours
IRIT - Institut de recherche en informatique de Toulouse
Abstract : The expansion of the web and the development of different information technologies have contributed to the proliferation of digital documents online. This availability of information has the advantage of making knowledge accessible to all. However, many problems emerged regarding access to relevant information that meets a user's need. The first problem is related to the extraction of the useful available information. A second problem concerns the use of this knowledge which sometimes results in plagiarism.The aim of this thesis is the development of a model that better characterizes documents to facilitate their access and also to detect those with a risk of plagiarism. This model is based on domain ontologies for the classification of documents and for calculating the similarity of documents belonging to the same domain as well. We are particularly interested in scientific papers, specifically their abstracts, short texts that are relatively well structured. The problem is, therefore, to determine how to assess the semantic proximity/similarity of two papers by examining their respective abstracts. Forasmuch as the domain ontology provides a useful way to represent knowledge relative to a given domain, our process is based on two actions:(i) An automatic classification of documents in a domain selected from several candidate domains. This classification determines the meaning of a document from the global context in which its content is used. (ii) A comparison of the texts performed on the basis of the construction of the semantic perimeter of each abstract and on a mutual enrichment performed when comparing the graphs of the abstracts. The semantic comparison of the abstracts is based on a segmentation of their respective content into zones, documentary units, reflecting their logical structure. It is on the comparison of the conceptual graphs of the zones playing the same role that the calculation of the similarity of the abstracts relies.
Document type :
Theses
Complete list of metadatas

Cited literature [71 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02491423
Contributor : Abes Star :  Contact
Submitted on : Wednesday, February 26, 2020 - 10:24:27 AM
Last modification on : Sunday, June 14, 2020 - 3:28:59 AM

File

Iltache_Samia.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02491423, version 1

Citation

Samia Iltache. Modélisation ontologique pour la recherche d'information : évaluation de la similarité sémantique de textes et application à la détection de plagiats. Informatique et langage [cs.CL]. Université Toulouse le Mirail - Toulouse II; Université Mouloud Mammeri (Tizi-Ouzou, Algérie), 2018. Français. ⟨NNT : 2018TOU20121⟩. ⟨tel-02491423⟩

Share

Metrics

Record views

87

Files downloads

100