Skip to Main content Skip to Navigation
Theses

Dynamics in semantic annotation, a perspective of information access system

Abstract : The information is growing and evolving everyday and in every human activity. Documents of different modalities store our information. The dynamic nature of information is given by a flow of documents. The huge and ever-growing document collections opens the need for organizing, relating and searching for information in an efficient way. Although full-text search tools have been developed, people continue to categorize documents, often using automatic classification tools. These annotations categories can be considered as a semantic indexing: classifying newspaper articles or blog posts allows journalists or readers to quickly find documents that have been published in the past in relation to a given topic. However, the quality of an index based on semantic annotation often deteriorates with time due to the dynamics of the information it describes: some categories are misused or forgotten by indexers, others become obsolete or too general to be useful. Through this study we introduce a dynamic perspective of semantic annotation. This perspective considers the passage of time and the permanent flow of documents that makes the collections grow and their annotation systems to extend and evolve. We also bring a vision of the quality of annotations systems based on the notion of information access. Traditionally, the quality of the annotation is considered in terms of semantic adequacy between the contents of the documents and the annotation terms describe them. In our vision, the quality of annotation vocabulary depends on the amount and complexity of information to be navigated by a user while searching for a certain topic. To address the problem of the dynamics in semantic annotation, this work proposes a modular architecture for dynamic semantic annotation. This architecture models the activities involved in the semantic annotation process in abstract modules dedicated to the different tasks that users have to perform. As a case of study we took blogging annotation. We gathered a corpus containing up to 10 years of annotated blog posts with categories and tags and we analyzed the annotation habits. By testing automatic tag and category strategies, we measure the impact of the dynamics in the annotation system. We propose some strategies to control this impact, which helps to evaluate the obsolescence of examples. Finally we propose a framework relying on three quality metrics and an interactive method to recover the quality of an indexing system based on semantic annotation. The metrics are evaluated over time to observe the degradation in indexing quality. A series of studied examples are presented to observe the performance of the measures to guide the restructuring of the indexing annotation system.
Keywords : Blogs
Complete list of metadatas

Cited literature [96 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02899882
Contributor : Abes Star :  Contact
Submitted on : Wednesday, July 15, 2020 - 4:19:39 PM
Last modification on : Friday, July 17, 2020 - 5:10:59 AM

File

edgalilee_th_2019_garrido.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02899882, version 2

Collections

Citation

Ivan Garrido Marquez. Dynamics in semantic annotation, a perspective of information access system. Data Structures and Algorithms [cs.DS]. Université Sorbonne Paris Cité, 2019. English. ⟨NNT : 2019USPCD008⟩. ⟨tel-02899882v2⟩

Share

Metrics

Record views

62

Files downloads

19