Date
Tuesday May 21, 2024 from 12:00 PM to 1:30 PMLocation
GZ 0.05Co-organizer
Mechanical EngineeringPrice
freeNonlinear policy optimization in deep reinforcement learning: policy gradients for wide neural networks
Andrea Agazzi, Assistant Professor in the Mathematics Department at the University of Pisa, is a guest of Mauro Salazar, Assistant Professor at Control Systems Technology group of the department of Mechanical Engineering, TU/e.
Title | Nonlinear policy optimization in deep reinforcement learning: policy gradients for wide neural networks
In recent years, we have witnessed multiple groundbreaking results obtained using neural networks as flexible (nonlinear) parameterizations of large policy classes to solve difficult reinforcement learning tasks, e.g., AlphaGO, Dota2, Self-driving cars. However, despite these successes, there exists a notable gap in providing theoretical explanations for the effectiveness of neural networks trained with (deep) reinforcement learning algorithms. In this presentation, I will first briefly overview of the policy optimization problem in reinforcement learning, along with an introduction to the policy gradient algorithm, a prototypical solution approach. Then, I will discuss some limitations of this algorithm when paired with general nonlinear policy classes. Finally, I will discuss how these limitations are bypassed by wide neural networks under an appropriate scaling of parameters at initialization, resulting in the convergence of the policy gradient training dynamics towards a so-called “mean-field” limit. In particular, in this setting one can prove global optimality of the dynamics' fixed points despite the nonlinear and nonconvex characteristics of the risk function.
Program
12:00 - 12.45 Lecture in Gemini South 0.05 (doors open at 11:45)
12:45 - 13:00 Q&A
13:00 Pizza lunch
Andrea Agazzi
Andrea Agazzi, Assistant Professor in the Mathematics Department at the University of Pisa, received his PhD in Theoretical Physics at the University of Geneva, and was then hired as a Griffith Research Assistant Professor at Duke University. Before that, he obtained his Bsc degree in physics at ETH Zurich and his Msc in theoretical physics at Imperial College London. His main research focus is in applied probability theory, using techniques from statistical mechanics and stochastic analysis to gain insight in the (stochastic) behavior of complex dynamical models emerging in real world applications. For example, he has worked on scaling limits of machine learning models seen as interacting particle systems, on the behavior of large networks of chemical reactions, focusing on the relations between their stochastic dynamics and their structure, and on stochastic approximations of complex fluid models.