BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Game theory\, distributional reinforcement learning\, control and 
 verification - Prof. Alessandro Abate\, Dr. Licio Romao\, Dr. Yulong Gao a
 nd Dr. Jiarui Gan. University of Oxford
DTSTART:20230607T100000Z
DTEND:20230607T113000Z
UID:TALK202237@talks.cam.ac.uk
CONTACT:Isaac Reid
DESCRIPTION:This week\, the MLG looks forward to welcoming four guest spea
 kers from Oxford.\n\n*Talk 1:*\n\n_Title:_ Formal Synthesis with Neural Te
 mplates\nSpeaker: Prof. Alessandro Abate (Dept. Computer Science\, Univ. o
 f Oxford\, UK)\n\n_Abstract:_ I shall present recent work on CEGIS\, a "co
 unterexample-guided inductive synthesis'' framework for sound synthesis ta
 sks that are relevant for dynamical models\, control problems\, and softwa
 re programs. The inductive synthesis framework comprises the interaction o
 f two components\, a learner and a verifier. The learner trains a neural t
 emplate on finite samples. The verifier soundly validates the candidates t
 rained by the learner\, by means of calls to a SAT-modulo-theory solver. W
 henever the candidate is not valid\, SMT-generated counter-examples are pa
 ssed to the learner for further training. \n\n_Bio:_ Alessandro Abate is P
 rofessor of Verification and Control in the Department of Computer Science
  at the University of Oxford\, where he is also Deputy Head of Department.
  Earlier\, he did research at Stanford University and at SRI International
 \, and was an Assistant Professor at the Delft Center for Systems and Cont
 rol\, TU Delft. He received an MS/PhD from the University of Padova and UC
  Berkeley.  His research interests lie on the formal verification and cont
 rol of stochastic hybrid systems\, and in their applications in cyber-phys
 ical systems\, particularly involving safety criticality and energy. He bl
 ends in techniques from machine learning and AI\, such as Bayesian inferen
 ce\, reinforcement learning\, and game theory.\n \n\n*Talk 2:*\n\n_Title:_
  Policy synthesis with guarantees \n\n_Speaker:_ Dr. Licio Romao (Dept. Co
 mputer Science\, Univ. of Oxford\, UK)\n\n_Abstract:_ In this talk\, I wil
 l present two techniques to perform feedback policy synthesis with guarant
 ees. First\, I will introduce a new concept of RL robustness and show how 
 to obtain the best robust policy within a class of sub-optimal solutions b
 y leveraging lexicographic optimisation. The proposed notion of robustness
  is motivated by the fact that\, at deployment\, the state of the system m
 ay not be precisely known due to measurement errors. In the second part of
  the talk\, I will present a new technique to derive abstractions of stoch
 astic dynamical systems. Our methodology is agnostic to the probability me
 asure that generates the noise and leads to an interval Markov Decision Pr
 ocess (iMDP) representation of the original dynamics\; the interval transi
 tion probability contains\, with high probability\, the true transition pr
 obability between states of the abstraction. The PAC guarantees of the pro
 posed framework are obtained due to a non-trivial connection with the scen
 ario approach theory\, a technique that has had tremendous success within 
 the control community.\n\n_Bio:_ Licio Romao is a postdoctoral research as
 sistant in the Department of Computer Science at the University of Oxford.
   He obtained his PhD in August 2021 from the Department of Engineering Sc
 ience\, and MSc and BSc from the University of Campinas (UNICAMP) and the 
 Federal University of Campina Grande (UFCG)\, respectively. His PhD thesis
  was awarded the Institute of Engineering Technology’s (IET) Control and
  Automation Dissertation Prize 2021. His research combines techniques from
  formal verification\, control theory\, applied mathematics\, and machine 
 learning to enable the design of safer and more reliable feedback systems.
 \n_Relevant papers:_\n·       D. Jarne\, L. Romao\, L. Hammond\, M. Mazo 
 Jr\, A. Abate. Observational Robustness and Invariances in Reinforcement L
 earning via Lexicographic Objectives. 2023. Link: https://licioromao.com/a
 ssets/papers/JRHMA23.pdf.\n·       T. Badings\, L. Romao\, A. Abate\, D. 
 Parker\, H. Poonwala\, M. Stoelinga\, N. Jensen. Robust Control for Dynami
 cal Systems with Non-Gaussian via Formal Abstractions. Journal of Artifici
 al Inteligence Research. 2023. Link: https://licioromao.com/assets/papers/
 BRAPPSJ23.pdf.\n·       T. Badings\, L. Romao\, A. Abate\, N. Jensen. Pro
 babilities are not enough: formal controller synthesis for stochastic dyna
 mical systems with epistemic uncertainty. AAAI Conference On Artificial In
 telligence \, 2023. Link: https://licioromao.com/assets/papers/BRAJ23a.pdf
 .\n\n\n*Talk 3:*\n\n_Title:_ Policy Evaluation in Distributional LQR\n\n_S
 peaker:_ Dr. Yulong Gao (Dept. Computer Science\, Univ. of Oxford\, UK)\n\
 n_Abstract:_ Distributional reinforcement learning (DRL) enhances the unde
 rstanding of the effects of the randomness in the environment by letting a
 gents learn the distribution of a random return\, rather than its expected
  value as in standard RL. At the same time\, a main challenge in DRL is th
 at policy evaluation in DRL typically relies on the representation of the 
 return distribution\, which needs to be carefully designed. In this talk\,
  I will discuss a special class of DRL problems that rely on discounted li
 near quadratic regulator (LQR) for control\, advocating for a new distribu
 tional approach to LQR\, which we call distributional LQR. Specifically\, 
 we provide a closed-form expression of the distribution of the random retu
 rn which\, remarkably\, is applicable to all exoge- nous disturbances on t
 he dynamics\, as long as they are independent and identically distributed 
 (i.i.d.). While the proposed exact return distribution consists of infinit
 ely many random variables\, we show that this distribution can be approxim
 ated by a finite number of random variables\, and the associated approxima
 tion error can be analytically bounded under mild assumptions. Using the a
 pproximate return distribution\, we propose a zeroth-order policy gradient
  algorithm for risk-averse LQR using the Conditional Value at Risk (CVaR) 
 as a measure of risk. Numerical experiments are provided to illustrate our
  theoretical results. (https://arxiv.org/abs/2303.13657)\n\n_Bio:_ Yulong 
 Gao is a postdoctoral researcher at the Department of Computer Science\, U
 niversity of Oxford.  He received the joint Ph.D. degree in Electrical Eng
 ineering in 2021 from KTH Royal Institute of Technology\, Sweden\, and Nan
 yang Technological University\, Singapore. Before moving to Oxford\, he wa
 s a Researcher at KTH from 2021 to 2022. He was the receipt of the VR Inte
 rnational Postdoc Grant from Swedish Research Council. His research intere
 sts include automatic verification\, stochastic control and model predicti
 ve control with application to safety-critical systems.\n\n*Talk 4:*\n\n_T
 itle:_ Sequential information and mechanism design\n\n_Speaker:_ Dr. Jiaru
 i Gan (Dept. Computer Science\, Univ. of Oxford\, UK)\n\n_Abstract:_ Many 
 problems in game theory involve reasoning between multiple parties with as
 ymmetric access to information. This broad class of problems lead to many 
 research questions about information and mechanism design\, with broad-ran
 ging applications from governance and public administration to e-commerce 
 and financial services. In particular\, there has been a recent surge of i
 nterest in exploring the more generalized sequential versions of these pro
 blems\, where players interact over multiple time steps in a changing envi
 ronment. In this talk\, I will present a framework of sequential principal
 -agent problems that is capable of modeling a wide range of information an
 d mechanism design problems. I will talk about our recent algorithmic resu
 lts on the computation and learning of optimal decision-making in this fra
 mework.\n \n_Bio:_ Jiarui Gan is a Departmental Lecturer at the Computer S
 cience Department\, University of Oxford\, working in the Artificial Intel
 ligence & Machine Learning research theme. Before this he was a postdoctor
 al researcher at Max Planck Institute for Software Systems\, and he obtain
 ed his PhD from Oxford. Jiarui is broadly interested in algorithmic proble
 ms in game theory. His current focus is on sequential information and mech
 anism design problems. His recent work has been selected for an Outstandin
 g Paper Honorable Mention at the AAAI'22 conference.
LOCATION:Cambridge University Engineering Department\, CBL Seminar room BE
 4-38.
END:VEVENT
END:VCALENDAR