BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking
  Unrelated Questions - Lorenzo Pacchiardi\, University of Cambridge
DTSTART:20240305T140000Z
DTEND:20240305T150000Z
UID:TALK212224@talks.cam.ac.uk
CONTACT:Hridoy Sankar Dutta
DESCRIPTION:Large language models (LLMs) can "lie"\, which we define as ou
 tputting false statements despite "knowing" the truth in a demonstrable se
 nse. LLMs might "lie"\, for example\, when instructed to output misinforma
 tion. Here\, we develop a simple lie detector that requires neither access
  to the LLM's activations (black-box) nor ground-truth knowledge of the fa
 ct in question. The detector works by asking a predefined set of unrelated
  follow-up questions after a suspected lie\, and feeding the LLM's yes/no 
 answers into a logistic regression classifier. Despite its simplicity\, th
 is lie detector is highly accurate and surprisingly general. When trained 
 on examples from a single setting -- prompting GPT-3.5 to lie about factua
 l questions -- the detector generalises out-of-distribution to (1) other L
 LM architectures\, (2) LLMs fine-tuned to lie\, (3) sycophantic lies\, and
  (4) lies emerging in real-life scenarios such as sales. These results ind
 icate that LLMs have distinctive lie-related behavioural patterns\, consis
 tent across architectures and contexts\, which could enable general-purpos
 e lie detection.\n\nhttps://cam-ac-uk.zoom.us/j/88053652228?pwd=NG1LTDdUc2
 VkV3pGdlpSdHZ5N3h0Zz09\n\nMeeting ID: 880 5365 2228\nPasscode: 081966\n\nR
 ECORDING : Please note\, this event will be recorded and will be available
  after the event for an indeterminate period under a CC BY -NC-ND license.
  Audience members should bear this in mind before joining the webinar or a
 sking questions.\n\nNOTE : Please do not post URLs for the talk\, and espe
 cially Zoom links to Twitter because automated systems will pick them up a
 nd disrupt our meeting.
LOCATION:Webinar &amp\; FW11\, Computer Laboratory\, William Gates Buildin
 g.
END:VEVENT
END:VCALENDAR
