Dynamique d'apprentissage pour Monte Carlo Tree Search : applications aux jeux de Go et du Clobber solitaire impartial

André Fabbri 1, 2
2 SMA - Systèmes Multi-Agents
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : Monte Carlo Tree Search (MCTS) has been initially introduced for the game of Go but has now been applied successfully to other games and opens the way to a range of new methods such as Multiple-MCTS or Nested Monte Carlo. MCTS evaluates game states through thousands of random simulations. As the simulations are carried out, the program guides the search towards the most promising moves. MCTS achieves impressive results by this dynamic, without an extensive need for prior knowledge. In this thesis, we choose to tackle MCTS as a full learning system. As a consequence, each random simulation turns into a simulated experience and its outcome corresponds to the resulting reinforcement observed. Following this perspective, the learning of the system results from the complex interaction of two processes : the incremental acquisition of new representations and their exploitation in the consecutive simulations. From this point of view, we propose two different approaches to enhance both processes. The first approach gathers complementary representations in order to enhance the relevance of the simulations. The second approach focuses the search on local sub-goals in order to improve the quality of the representations acquired. The methods presented in this work have been applied to the games of Go and Impartial Solitaire Clobber. The results obtained in our experiments highlight the significance of these processes in the learning dynamic and draw up new perspectives to enhance further learning systems such as MCTS
Document type :
Theses
Complete list of metadatas

Cited literature [74 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01234642
Contributor : Abes Star <>
Submitted on : Wednesday, December 2, 2015 - 9:37:07 AM
Last modification on : Friday, May 17, 2019 - 10:27:12 AM
Long-term archiving on : Saturday, April 29, 2017 - 12:53:21 AM

File

TH2015FabbriAndre.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01234642, version 1

Citation

André Fabbri. Dynamique d'apprentissage pour Monte Carlo Tree Search : applications aux jeux de Go et du Clobber solitaire impartial. Intelligence artificielle [cs.AI]. Université Claude Bernard - Lyon I, 2015. Français. ⟨NNT : 2015LYO10183⟩. ⟨tel-01234642⟩

Share

Metrics

Record views

994

Files downloads

1233