L’évolution des systèmes et architectures d’information sous l’influence des données massives : les lacs de données

Abstract : Data is on the heart of the digital transformation.The consequence is anacceleration of the information system evolution , which must adapt. The Big data phenomenonplays the role of catalyst of this evolution.Under its influence appears a new component of the information system: the data lake.Far from replacing the decision support systems that make up the information system, data lakes comecomplete information systems’s architecture.First, we focus on the factors that influence the evolution of information systemssuch as new software and middleware, new infrastructure technologies, but also the decision support system usage itself.Under the big data influence we study the impact that this entails especially with the appearance ofnew technologies such as Apache Hadoop as well as the current limits of the decision support system .The limits encountered by the current decision support system force a change to the information system which mustadapt and that gives birth to a new component: the data lake.In a second time we study in detail this new component, formalize our definition, giveour point of view on its positioning in the information system as well as with regard to the decision support system .In addition, we highlight a factor influencing the architecture of data lakes: data gravity, doing an analogy with the law of gravity and focusing on the factors that mayinfluence the data-processing relationship. We highlight, through a use case, that takingaccount of the data gravity can influence the design of a data lake.We complete this work by adapting the software product line approach to boot a methodof formalizations and modeling of data lakes. This method allows us:- to establish a minimum list of components to be put in place to operate a data lake without transforming it into a data swamp,- to evaluate the maturity of an existing data lake,- to quickly diagnose the missing components of an existing data lake that would have become a dataswamp- to conceptualize the creation of data lakes by being "software agnostic “.
Complete list of metadatas

Cited literature [84 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02138983
Contributor : Abes Star <>
Submitted on : Friday, May 24, 2019 - 12:09:10 PM
Last modification on : Monday, June 17, 2019 - 5:46:04 PM

File

MADERA_2018_archivage_cor.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02138983, version 1

Collections

Citation

Cédrine Madera. L’évolution des systèmes et architectures d’information sous l’influence des données massives : les lacs de données. Base de données [cs.DB]. Université Montpellier, 2018. Français. ⟨NNT : 2018MONTS071⟩. ⟨tel-02138983⟩

Share

Metrics

Record views

202

Files downloads

358