BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:When is Multilinguality a Curse? Language Modeling for 350 Languag
 es - Catherine Arnett and Tyler Chang (EleutherAI and UC San Diego)
DTSTART:20250606T140000Z
DTEND:20250606T150000Z
UID:TALK230017@talks.cam.ac.uk
CONTACT:Suchir Salhan
DESCRIPTION:NOTE THE UNUSUAL TIME FOR THIS SEMINAR\n\nLanguage models work
  well for a small number of languages. For the other languages\, the best 
 existing language model is likely multilingual\, still with the vast major
 ity of the training data coming from English and a few "priority" language
 s. We show that in many cases\, multilinguality leads to worse performance
  across many languages due to limited model capacity. We then train a suit
 e of over 1\,000 monolingual models for 350 languages\, finding that these
  models can outperform multilingual models over ten times their size. Howe
 ver\, multilinguality can also be a blessing: we train a small number of c
 ontrolled bilingual models in order to study how crosslingual transfer hap
 pens. We aim to better understand transfer learning in order to better lev
 erage multilinguality to improve language model performance for all langua
 ges.
LOCATION:ONLINE ONLY. Here is the Zoom link: https://cam-ac-uk.zoom.us/j/4
 751389294?pwd=Z2ZOSDk0eG1wZldVWG1GVVhrTzFIZz09
END:VEVENT
END:VCALENDAR
