University of Cambridge > Talks.cam > Engineering Safe AI > 'Off-Switch Games' and Corrigibility

Log in

University Account

External (via Google)

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

'Off-Switch Games' and Corrigibility

Download to your calendar using vCal

Richard Ngo (University of Cambridge)
Wednesday 01 November 2017, 17:00-18:30
Cambridge University Engineering Department, CBL Seminar room BE4-38. For directions see http://learning.eng.cam.ac.uk/Public/Directions.

If you have a question about this talk, please contact Adrià Garriga Alonso .

By default, an AI system will have an incentive to prevent humans from switching it off, or otherwise interfering in its operation, as this would prevent it from maximising its reward. An AI system is ‘corrigible’ if it has an incentive to accept human corrections. Inverse Reinforcement Learning (IRL) can help mitigate this problem in some cases, but there is disagreement as to whether IRL can guarantee corrigibility in all cases.

Papers: https://arxiv.org/abs/1611.08219 https://intelligence.org/files/Corrigibility.pdf https://intelligence.org/2017/08/31/incorrigibility-in-cirl/

This talk is part of the Engineering Safe AI series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Abstract

Papers: https://arxiv.org/abs/1611.08219 https://intelligence.org/files/Corrigibility.pdf https://intelligence.org/2017/08/31/incorrigibility-in-cirl/

Log in

🔐 Log In

Information on

ℹ️ Information

'Off-Switch Games' and Corrigibility

This talk is included in these lists:

'Off-Switch Games' and Corrigibility

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

'Off-Switch Games' and Corrigibility

This talk is included in these lists:

Other lists

Other talks

'Off-Switch Games' and Corrigibility

Abstract

Included in Lists