Skip to Main content Skip to Navigation
Theses

SALZA : mesure d’information universelle entre chaînes pour la classificationet l’inférence de causalité

Abstract : Data in the form of strings are varied (DNA, text, quantify EEG) and cannot always be modeled. A universal description of strings, independent of probabilities, is thus necessary. The Kolmogorov complexity was introduced in 1960 to address the issue. The principle is simple: a string is complex if a short description of it does not exist. The Kolmogorov complexity is the counterpart of the Shannon entropy and defines the algorithmic information theory. Yet, the Kolmogorov complexity is not computable in finit time making it unusable in practice.The first ones to make operational the Kolmogorov complexity are Lempel and Ziv in 1976 who proposed to restrain the operations of the description. Another approach uses the size of the compressed string by a lossless data compression algorithm. Yet these two estimators are not well-defined regarding the joint and conditional complexity cases. So, compressors and Lempel-Ziv complexity are not valuable to estimate algorithmic information theory.In the light of this observation, we introduce a new universal information measure based on the Lempel-Ziv complexity called SALZA. The implementation and the good definition of our measure allow computing efficiently values of the algorithmic information theory.Usual lossless compressors have been used by Cilibrasi and Vitányi to define a very popular universal classifier: the normalized compression distance [NCD]. As part of this application, we introduce our own estimator, called the NSD, and we show that the NSD is a universal semi-distance between strings. NSD surpasses NCD because it gets used to a large data set and uses the adapted conditioning with SALZA.Using the accurate universal prediction quality of the Lempel-Ziv complexity, we explore the question of causality inference. At first, we compute the algorithmic causal Markov condition thanks to SALZA. Then we define, for the first time, the algorithmic directed information and based on it we introduce the algorithmic Granger causality. The relevance of our approach is demonstrated on real and synthetic data.
Complete list of metadatas

Cited literature [106 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02064902
Contributor : Abes Star :  Contact
Submitted on : Tuesday, March 12, 2019 - 11:54:05 AM
Last modification on : Tuesday, October 6, 2020 - 12:38:02 PM
Long-term archiving on: : Thursday, June 13, 2019 - 2:50:20 PM

File

REVOLLE_2018_diffusion.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02064902, version 1

Collections

Citation

Marion Revolle. SALZA : mesure d’information universelle entre chaînes pour la classificationet l’inférence de causalité. Traitement du signal et de l'image [eess.SP]. Université Grenoble Alpes, 2018. Français. ⟨NNT : 2018GREAT079⟩. ⟨tel-02064902⟩

Share

Metrics

Record views

328

Files downloads

140