Local post-hoc interpretability for black-box classifiers

Thibault Laugel

Thèse Année : 2020

Local post-hoc interpretability for black-box classifiers

Interprétabilité locale post-hoc des modèles de classification "boites noires"

(1)

Thibault Laugel

Fonction : Auteur
PersonId : 1226861
IdRef : 259200476

Learning, Fuzzy and Intelligent systems

Résumé

This thesis focuses on the field of XAI (eXplainable AI), and more particularly local post-hoc interpretability paradigm, that is to say the generation of explanations for a single prediction of a trained classifier. In particular, we study a fully agnostic context, meaning that the explanation is generated without using any knowledge about the classifier (treated as a black-box) nor the data used to train it. In this thesis, we identify several issues that can arise in this context and that may be harmful for interpretability. We propose to study each of these issues and propose novel criteria and approaches to detect and characterize them. The three issues we focus on are: the risk of generating explanations that are out of distribution; the risk of generating explanations that cannot be associated to any ground-truth instance; and the risk of generating explanations that are not local enough. These risks are studied through two specific categories of interpretability approaches: counterfactual explanations, and local surrogate models.

Cette thèse porte sur le domaine du XAI (explicabilité de l'IA), et plus particulièrement sur le paradigme de l'interprétabilité locale post-hoc, c'est-à-dire la génération d'explications pour une prédiction unique d'un classificateur entraîné. En particulier, nous étudions un contexte totalement agnostique, c'est-à-dire que l'explication est générée sans utiliser aucune connaissance sur le modèle de classification (traité comme une boîte noire) ni les données utilisées pour l'entraîner. Dans cette thèse, nous identifions plusieurs problèmes qui peuvent survenir dans ce contexte et qui peuvent être préjudiciables à l'interprétabilité. Nous nous proposons d'étudier chacune de ces questions et proposons des critères et des approches nouvelles pour les détecter et les caractériser. Les trois questions sur lesquelles nous nous concentrons sont : le risque de générer des explications qui sont hors distribution ; le risque de générer des explications qui ne peuvent être associées à aucune instance d'entraînement ; et le risque de générer des explications qui ne sont pas assez locales. Ces risques sont étudiés à travers deux catégories spécifiques d'approches de l'interprétabilité : les explications contrefactuelles et les modèles de substitution locaux.

Mots clés

Machine learning Post-hoc interpretability XAI

Interprétabilité Apprentissage automatique Post-hoc Explications contrefactuelles Boîtes noires Intelligence artificielle

Domaines

Apprentissage [cs.LG] Intelligence artificielle [cs.AI]

Fichier principal

LAUGEL_Thibault_2020.pdf (4.38 Mo)

Origine : Version validée par le jury (STAR)

ABES STAR : Contact

https://theses.hal.science/tel-03987631

Soumis le : mardi 14 février 2023-10:36:19

Dernière modification le : samedi 7 octobre 2023-21:36:22

Dates et versions

tel-03987631 , version 1 (16-11-2020)

tel-03987631 , version 2 (14-02-2023)

Identifiants

HAL Id : tel-03987631 , version 2

Citer

Thibault Laugel. Local post-hoc interpretability for black-box classifiers. Machine Learning [cs.LG]. Sorbonne Université, 2020. English. ⟨NNT : 2020SORUS215⟩. ⟨tel-03987631v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS STAR LIP6 SORBONNE-UNIVERSITE THESES-SU SU-SCIENCES

409 Consultations

468 Téléchargements

Local post-hoc interpretability for black-box classifiers

Interprétabilité locale post-hoc des modèles de classification "boites noires"

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager