Anytime discovery of a diverse set of patterns with Monte Carlo tree search

Guillaume Bosc 1
1 DM2L - Data Mining and Machine Learning
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : The discovery of patterns that strongly distinguish one class label from another is still a challenging data-mining task. Subgroup Discovery (SD) is a formal pattern mining framework that enables the construction of intelligible classifiers, and, most importantly, to elicit interesting hypotheses from the data. However, SD still faces two major issues: (i) how to define appropriate quality measures to characterize the interestingness of a pattern; (ii) how to select an accurate heuristic search technique when exhaustive enumeration of the pattern space is unfeasible. The first issue has been tackled by Exceptional Model Mining (EMM) for discovering patterns that cover tuples that locally induce a model substantially different from the model of the whole dataset. The second issue has been studied in SD and EMM mainly with the use of beam-search strategies and genetic algorithms for discovering a pattern set that is non-redundant, diverse and of high quality. In this thesis, we argue that the greedy nature of most such previous approaches produces pattern sets that lack diversity. Consequently, we formally define pattern mining as a game and solve it with Monte Carlo Tree Search (MCTS), a recent technique mainly used for games and planning problems in artificial intelligence. Contrary to traditional sampling methods, MCTS leads to an any-time pattern mining approach without assumptions on either the quality measure or the data. It converges to an exhaustive search if given enough time and memory. The exploration/exploitation trade-off allows the diversity of the result set to be improved considerably compared to existing heuristics. We show that MCTS quickly finds a diverse pattern set of high quality in our application in neurosciences. We also propose and validate a new quality measure especially tuned for imbalanced multi-label data.
Document type :
Theses
Complete list of metadatas

Cited literature [162 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02001153
Contributor : Abes Star <>
Submitted on : Friday, February 1, 2019 - 1:11:40 PM
Last modification on : Wednesday, November 20, 2019 - 2:54:54 AM
Long-term archiving on: Thursday, May 2, 2019 - 1:33:34 PM

File

these.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02001153, version 1

Citation

Guillaume Bosc. Anytime discovery of a diverse set of patterns with Monte Carlo tree search. Artificial Intelligence [cs.AI]. Université de Lyon, 2017. English. ⟨NNT : 2017LYSEI074⟩. ⟨tel-02001153⟩

Share

Metrics

Record views

152

Files downloads

224