How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
- π€ Speaker: Lorenzo Pacchiardi, University of Cambridge
- π Date & Time: Tuesday 05 March 2024, 14:00 - 15:00
- π Venue: Webinar & FW11, Computer Laboratory, William Gates Building.
Abstract
Large language models (LLMs) can “lie”, which we define as outputting false statements despite “knowing” the truth in a demonstrable sense. LLMs might “lie”, for example, when instructed to output misinformation. Here, we develop a simple lie detector that requires neither access to the LLM ’s activations (black-box) nor ground-truth knowledge of the fact in question. The detector works by asking a predefined set of unrelated follow-up questions after a suspected lie, and feeding the LLM ’s yes/no answers into a logistic regression classifier. Despite its simplicity, this lie detector is highly accurate and surprisingly general. When trained on examples from a single setting—prompting GPT -3.5 to lie about factual questions—the detector generalises out-of-distribution to (1) other LLM architectures, (2) LLMs fine-tuned to lie, (3) sycophantic lies, and (4) lies emerging in real-life scenarios such as sales. These results indicate that LLMs have distinctive lie-related behavioural patterns, consistent across architectures and contexts, which could enable general-purpose lie detection.
https://cam-ac-uk.zoom.us/j/88053652228?pwd=NG1LTDdUc2VkV3pGdlpSdHZ5N3h0Zz09
Meeting ID: 880 5365 2228 Passcode: 081966
RECORDING : Please note, this event will be recorded and will be available after the event for an indeterminate period under a CC BY -NC-ND license. Audience members should bear this in mind before joining the webinar or asking questions.
NOTE : Please do not post URLs for the talk, and especially Zoom links to Twitter because automated systems will pick them up and disrupt our meeting.
Series This talk is part of the Computer Laboratory Security Seminar series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge talks
- Computer Laboratory Security Seminar
- Department of Computer Science and Technology talks and seminars
- Interested Talks
- School of Technology
- Security-related talks
- Trust & Technology Initiative - interesting events
- Webinar & FW11, Computer Laboratory, William Gates Building.
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Lorenzo Pacchiardi, University of Cambridge
Tuesday 05 March 2024, 14:00-15:00