Nonlinear policy optimization in deep reinforcement learning: policy gradients for wide neural networks

EAISI lecture by visiting Professor Andrea Agazzi

Name: EAISI lecture by visiting Professor Andrea Agazzi
Start: 2024-05-21T12:00:00+02:00
End: 2024-05-21T13:30:00+02:00
Location: GZ 0.05

Date

Tuesday May 21, 2024 from 12:00 PM to 1:30 PM

Location

GZ 0.05

Organizer

Eindhoven Artificial Intelligence Systems Institute

Co-organizer

Mechanical Engineering

Price

free

Twitter Facebook LinkedIn

Nonlinear policy optimization in deep reinforcement learning: policy gradients for wide neural networks

Andrea Agazzi, Assistant Professor in the Mathematics Department at the University of Pisa, is a guest of Mauro Salazar, Assistant Professor at Control Systems Technology group of the department of Mechanical Engineering, TU/e.

Title | Nonlinear policy optimization in deep reinforcement learning: policy gradients for wide neural networks

In recent years, we have witnessed multiple groundbreaking results obtained using neural networks as flexible (nonlinear) parameterizations of large policy classes to solve difficult reinforcement learning tasks, e.g., AlphaGO, Dota2, Self-driving cars. However, despite these successes, there exists a notable gap in providing theoretical explanations for the effectiveness of neural networks trained with (deep) reinforcement learning algorithms. In this presentation, I will first briefly overview of the policy optimization problem in reinforcement learning, along with an introduction to the policy gradient algorithm, a prototypical solution approach. Then, I will discuss some limitations of this algorithm when paired with general nonlinear policy classes. Finally, I will discuss how these limitations are bypassed by wide neural networks under an appropriate scaling of parameters at initialization, resulting in the convergence of the policy gradient training dynamics towards a so-called “mean-field” limit. In particular, in this setting one can prove global optimality of the dynamics' fixed points despite the nonlinear and nonconvex characteristics of the risk function.

Program
12:00 - 12.45 Lecture in Gemini South 0.05 (doors open at 11:45)
12:45 - 13:00 Q&A
13:00 Pizza lunch

Andrea Agazzi

Andrea Agazzi, Assistant Professor in the Mathematics Department at the University of Pisa, received his PhD in Theoretical Physics at the University of Geneva, and was then hired as a Griffith Research Assistant Professor at Duke University. Before that, he obtained his Bsc degree in physics at ETH Zurich and his Msc in theoretical physics at Imperial College London. His main research focus is in applied probability theory, using techniques from statistical mechanics and stochastic analysis to gain insight in the (stochastic) behavior of complex dynamical models emerging in real world applications. For example, he has worked on scaling limits of machine learning models seen as interacting particle systems, on the behavior of large networks of chemical reactions, focusing on the relations between their stochastic dynamics and their structure, and on stochastic approximations of complex fluid models.

Organizer

Eindhoven Artificial Intelligence Systems Institute

EAISI brings together all AI activities of the TU/e. Top researchers from various departments and research groups work together to create new and exciting AI applications with a direct impact on the real world. All this in close collaboration with our students and representatives from industry.

Eindhoven Artificial Intelligence Systems Institute