Skip to Main content Skip to Navigation

Anytime discovery of a diverse set of patterns with Monte Carlo tree search

Guillaume Bosc 1 
1 DM2L - Data Mining and Machine Learning
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : The discovery of patterns that strongly distinguish one class label from another is still a challenging data-mining task. Subgroup Discovery (SD) is a formal pattern mining framework that enables the construction of intelligible classifiers, and, most importantly, to elicit interesting hypotheses from the data. However, SD still faces two major issues: (i) how to define appropriate quality measures to characterize the interestingness of a pattern; (ii) how to select an accurate heuristic search technique when exhaustive enumeration of the pattern space is unfeasible. The first issue has been tackled by Exceptional Model Mining (EMM) for discovering patterns that cover tuples that locally induce a model substantially different from the model of the whole dataset. The second issue has been studied in SD and EMM mainly with the use of beam-search strategies and genetic algorithms for discovering a pattern set that is non-redundant, diverse and of high quality. In this thesis, we argue that the greedy nature of most such previous approaches produces pattern sets that lack diversity. Consequently, we formally define pattern mining as a game and solve it with Monte Carlo Tree Search (MCTS), a recent technique mainly used for games and planning problems in artificial intelligence. Contrary to traditional sampling methods, MCTS leads to an any-time pattern mining approach without assumptions on either the quality measure or the data. It converges to an exhaustive search if given enough time and memory. The exploration/exploitation trade-off allows the diversity of the result set to be improved considerably compared to existing heuristics. We show that MCTS quickly finds a diverse pattern set of high quality in our application in neurosciences. We also propose and validate a new quality measure especially tuned for imbalanced multi-label data.
Document type :
Complete list of metadata

Cited literature [162 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Friday, February 1, 2019 - 1:11:40 PM
Last modification on : Tuesday, June 1, 2021 - 2:08:09 PM
Long-term archiving on: : Thursday, May 2, 2019 - 1:33:34 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02001153, version 1


Guillaume Bosc. Anytime discovery of a diverse set of patterns with Monte Carlo tree search. Artificial Intelligence [cs.AI]. Université de Lyon, 2017. English. ⟨NNT : 2017LYSEI074⟩. ⟨tel-02001153⟩



Record views


Files downloads