BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Making and breaking tokenizers - Sander Land (Writer) 
DTSTART:20251017T110000Z
DTEND:20251017T120000Z
UID:TALK236629@talks.cam.ac.uk
CONTACT:Suchir Salhan
DESCRIPTION:Despite massive investments in training large language models\
 , tokenizers remain a critical but often neglected component with weakness
 es that can cause wild hallucinations\, bypass safety guardrails\, and bre
 ak downstream applications. This talk will cover:\n\nOur recent research i
 n automatically detecting problematic 'glitch' tokens in any model\n\nFund
 amental issues with pretokenizers and their design\n\nNovel approaches to 
 encodings and pretokenization that address some of these problems.\n\n**Sp
 eaker Bio**\nSander Land is a researcher at Writer\, previously working at
  Cohere. He completed his PhD at the 	Department of Computer Science\, Uni
 versity of Oxford\, before undertaking a postdoc at Biomedical Engineering
 \, King's College London\, University of London. 
LOCATION:SS03 Hybrid (In-Person + Online). Google Meet:  https://meet.goog
 le.com/yeu-pqce-rsn 
END:VEVENT
END:VCALENDAR
