Skip to Main content Skip to Navigation

Multi-objective sequential decision making

Weijia Wang 1 
1 TAO - Machine Learning and Optimisation
LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : This thesis is concerned with multi-Objective sequential decision making (MOSDM). The motivation is twofold. On the one hand, many decision problems in the domains of e.g., robotics, scheduling or games, involve the optimization of sequences of decisions. On the other hand, many real-World applications are most naturally formulated in terms of multi-Objective optimization (MOO). The proposed approach extends the well-Known Monte-Carlo tree search (MCTS) framework to the MOO setting, with the goal of discovering several optimal sequences of decisions through growing a single search tree. The main challenge is to propose a new reward, able to guide the exploration of the tree although the MOO setting does not enforce a total order among solutions. The main contribution of the thesis is to propose and experimentally study two such rewards, inspired from the MOO literature and assessing a solution with respect to the archive of previous solutions (Pareto archive): the hypervolume indicator and the Pareto dominance reward. The study shows the complementarity of these two criteria. The hypervolume indicator suffers from its known computational complexity; however the proposed extension thereof provides fine-Grained information about the quality of solutions with respect to the current archive. Quite the contrary, the Pareto-Dominance reward is linear but it provides increasingly rare information. Proofs of principle of the approach are given on artificial problems and challenges, and confirm the merits of the approach. In particular, MOMCTS is able to discover policies lying in non-Convex regions of the Pareto front, contrasting with the state of the art: existing Multi-Objective Reinforcement Learning algorithms are based on linear scalarization and thus fail to sample such non-Convex regions. Finally MOMCTS honorably competes with the state of the art on the 2013 MOPTSP competition.
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Thursday, August 21, 2014 - 12:47:08 PM
Last modification on : Friday, October 7, 2022 - 3:45:07 AM
Long-term archiving on: : Thursday, November 27, 2014 - 12:51:24 PM


  • HAL Id : tel-01057079, version 1


Weijia Wang. Multi-objective sequential decision making. Machine Learning [cs.LG]. Université Paris Sud - Paris XI, 2014. English. ⟨NNT : 2014PA112156⟩. ⟨tel-01057079⟩



Record views


Files downloads