'Off-Switch Games' and Corrigibility
- 👤 Speaker: Richard Ngo (University of Cambridge) 🔗 Website
- 📅 Date & Time: Wednesday 01 November 2017, 17:00 - 18:30
- 📍 Venue: Cambridge University Engineering Department, CBL Seminar room BE4-38. For directions see http://learning.eng.cam.ac.uk/Public/Directions
Abstract
By default, an AI system will have an incentive to prevent humans from switching it off, or otherwise interfering in its operation, as this would prevent it from maximising its reward. An AI system is ‘corrigible’ if it has an incentive to accept human corrections. Inverse Reinforcement Learning (IRL) can help mitigate this problem in some cases, but there is disagreement as to whether IRL can guarantee corrigibility in all cases.
Papers: https://arxiv.org/abs/1611.08219 https://intelligence.org/files/Corrigibility.pdf https://intelligence.org/2017/08/31/incorrigibility-in-cirl/
Series This talk is part of the Engineering Safe AI series.
Included in Lists
- Cambridge talks
- Cambridge University Engineering Department, CBL Seminar room BE4-38. For directions see http://learning.eng.cam.ac.uk/Public/Directions
- Chris Davis' list
- Engineering Safe AI
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)



Wednesday 01 November 2017, 17:00-18:30