Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Theses

Toward interpretable machine learning, with applications to large-scale industrial systems data

Abstract : The contributions presented in this work are two-fold. We first provide a general overview of explanations and interpretable machine learning, making connections with different fields, including sociology, psychology, and philosophy, introducing a taxonomy of popular explainability approaches and evaluation methods. We subsequently focus on rule learning, a specific family of transparent models, and propose a novel rule-based classification approach, based on monotone Boolean function synthesis: LIBRE. LIBRE is an ensemble method that combines the candidate rules learned by multiple bottom-up learners with a simple union, in order to obtain a final intepretable rule set. Our method overcomes most of the limitations of state-of-the-art competitors: it successfully deals with both balanced and imbalanced datasets, efficiently achieving superior performance and higher interpretability in real datasets. Interpretability of data representations constitutes the second broad contribution to this work. We restrict our attention to disentangled representation learning, and, in particular, VAE-based disentanglement methods to automatically learn representations consisting of semantically meaningful features. Recent contributions have demonstrated that disentanglement is impossible in purely unsupervised settings. Nevertheless, incorporating inductive biases on models and data may overcome such limitations. We present a new disentanglement method - IDVAE - with theoretical guarantees on disentanglement, deriving from the employment of an optimal exponential factorized prior, conditionally dependent on auxiliary variables complementing input observations. We additionally propose a semi-supervised version of our method. Our experimental campaign on well-established datasets in the literature shows that IDVAE often beats its competitors according to several disentanglement metrics.
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03467524
Contributor : ABES STAR :  Contact
Submitted on : Monday, December 6, 2021 - 3:31:07 PM
Last modification on : Wednesday, December 8, 2021 - 9:45:36 AM
Long-term archiving on: : Monday, March 7, 2022 - 7:14:47 PM

File

MITA_Graziano_2021.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03467524, version 1

Citation

Graziano Mita. Toward interpretable machine learning, with applications to large-scale industrial systems data. Numerical Analysis [cs.NA]. Sorbonne Université, 2021. English. ⟨NNT : 2021SORUS112⟩. ⟨tel-03467524⟩

Share

Metrics

Record views

69

Files downloads

61