BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Out-of-context reasoning/learning in LLMs and its safety implicati
 ons - Dmitrii Krasheninnikov\, Usman Anwar\, University of Cambridge
DTSTART:20250402T100000Z
DTEND:20250402T113000Z
UID:TALK229828@talks.cam.ac.uk
CONTACT:120952
DESCRIPTION:Beyond learning patterns within individual training datapoints
 \, Large Language Models (LLMs) can infer latent structures and relationsh
 ips by aggregating information scattered across different training samples
  through out-of-context reasoning (OOCR) [1\, 2]. We'll review key empiric
 al findings\, including Implicit Meta-Learning (models learning source rel
 iability implicitly and subsequently internalizing reliable-seeming data m
 ore strongly [1]) and Inductive OOCR (models inferring other latent struct
 ures from scattered data [3]). We'll explore potential mechanisms behind t
 hese phenomena [1\, 4]. Finally\, we'll discuss the significant AI safety 
 implications\, arguing that OOCR coupled with Situational Awareness [5] un
 derpins threats like Alignment Faking [6]\, potentially leading to persist
 ent misalignment resistant to standard alignment techniques.\n\n1. Krashen
 innikov et al.\, "Implicit meta-learning may lead language models to trust
  more reliable sources" https://arxiv.org/abs/2310.15047\n2. Berglund et a
 l.\, "Taken Out of Context: On Measuring Out-of-Context Reasoning in LLMs"
  https://arxiv.org/abs/2309.00667\n3. Treutlein et al.\, "Connecting the D
 ots: LLMs can Infer and Verbalize Latent Structure from Disparate Training
  Data" https://arxiv.org/abs/2406.14546\n4. Feng et al.\, "Extractive Stru
 ctures Learned in Pretraining Enable Generalization on Finetuned Facts" ht
 tps://arxiv.org/abs/2412.04614\n5. Laine et al.\, "Me\, Myself\, and AI: T
 he Situational Awareness Dataset (SAD) for LLMs" https://arxiv.org/abs/240
 7.04694\n6. Greenblatt et al.\, "Alignment faking in large language models
 " https://arxiv.org/abs/2412.14093
LOCATION:Cambridge University Engineering Department\, CBL Seminar room BE
 4-38.
END:VEVENT
END:VCALENDAR