Optimisation des chaînes de production dans l'industrie sidérurgique : une approche statistique de l'apprentissage par renforcement

Matthieu Geist 1, 2, 3
3 CORIDA - Robust control of infinite dimensional systems and applications
IECN - Institut Élie Cartan de Nancy, LMAM - Laboratoire de Mathématiques et Applications de Metz, Inria Nancy - Grand Est
Abstract : Reinforcement learning is the response of machine learning to the problem of optimal control. In this paradigm, an agent learns to control an environment by interacting with it. It receives evenly a numeric reward (or reinforcement signal), which is a local information about the quality of the control. The agent objective is to maximize a cumulative function of these rewards, generally modelled as a so-called value function. A policy specifies the action to be chosen in a particular configuration of the environment to be controlled, and thus the value function quantifies the quality of this policy. This paragon is very general, and it allows taking into account many applications. In this manuscript, we apply it to a gas flow management problem in the iron and steel industry. However, its application can be quite difficult. Notably, if the environment description is too large, an exact representation of the value function (or of the policy) is not possible. This problem is known as generalization (or value function approximation): on the one hand, one has to design algorithms with low computational complexity, and on the other hand, one has to infer the behaviour the agent should have in an unknown configuration of the environment when close configurations have been experimented. This is the main problem we address in this manuscript, by introducing a family of algorithms inspired from Kalman filtering.
Document type :
Theses
Complete list of metadatas

Cited literature [122 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01752647
Contributor : Sébastien van Luchene <>
Submitted on : Wednesday, December 16, 2009 - 2:59:34 PM
Last modification on : Wednesday, February 13, 2019 - 5:20:08 PM
Long-term archiving on : Thursday, June 17, 2010 - 11:46:07 PM

Identifiers

  • HAL Id : tel-01752647, version 2

Citation

Matthieu Geist. Optimisation des chaînes de production dans l'industrie sidérurgique : une approche statistique de l'apprentissage par renforcement. Mathématiques [math]. Université Paul Verlaine - Metz, 2009. Français. ⟨NNT : 2009METZ023S⟩. ⟨tel-01752647v2⟩

Share

Metrics

Record views

773

Files downloads

9279