Accès sémantique aux données massives et hétérogènes en santé

Abstract : Clinical data are produced as part of the practice of medicine by different health professionals, in several places and in various formats. They therefore present an heterogeneity both in terms of their nature and structure and are furthermore of a particularly large volume, which make them considered as Big Data. The work carried out in this thesis aims at proposing an effective information retrieval method within the context of this type of complex and massive data. First, the access to clinical data constrained by the need to model clinical information. This can be done within Electronic Health Records and, in a larger extent, within data Warehouses. In this thesis, I proposed a proof of concept of a search engine allowing the access to the information contained in the Semantic Health Data Warehouse of the Rouen University Hospital. A generic data model allows this data warehouse to view information as a graph of data, thus enabling to model the information while preserving its conceptual complexity. In order to provide search functionalities adapted to this generic representation of data, a query language allowing access to clinical information through the various entities of which it is composed has been developed and implemented as a part of this thesis’s work. Second, the massiveness of clinical data is also a major technical challenge that hinders the implementation of an efficient information retrieval. The initial implementation of the proof of concept highlighted the limits of a relational database management systems when used in the context of clinical data. A migration to a NoSQL key-value store has been then completed. Although offering good atomic data access performance, this migration nevertheless required additional developments and the design of a suitable hardware and applicative architecture toprovide advanced search functionalities. Finally, the contribution of this work within the general context of the Semantic Health Data Warehouse of the Rouen University Hospital was evaluated. The proof of concept proposed in this work was used to access semantic descriptions of information in order to meet the criteria for including and excluding patients in clinical studies. In this evaluation, a total or partial response is given to 72.97% of the criteria. In addition, the genericity of the tool has also made it possible to use it in other contexts such as documentary and bibliographic information retrieval in health.
Document type :
Theses
Complete list of metadatas

Cited literature [240 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02287217
Contributor : Abes Star <>
Submitted on : Friday, September 13, 2019 - 4:44:06 PM
Last modification on : Wednesday, October 16, 2019 - 1:21:06 PM

File

romainlelong2.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02287217, version 1

Citation

Romain Lelong. Accès sémantique aux données massives et hétérogènes en santé. Recherche d'information [cs.IR]. Normandie Université, 2019. Français. ⟨NNT : 2019NORMR030⟩. ⟨tel-02287217⟩

Share

Metrics

Record views

186

Files downloads

118