When is Multilinguality a Curse? Language Modeling for 350 Languages
- đ¤ Speaker: Catherine Arnett and Tyler Chang (EleutherAI and UC San Diego)
- đ Date & Time: Friday 06 June 2025, 15:00 - 16:00
- đ Venue: ONLINE ONLY. Here is the Zoom link: https://cam-ac-uk.zoom.us/j/4751389294?pwd=Z2ZOSDk0eG1wZldVWG1GVVhrTzFIZz09
Abstract
NOTE THE UNUSUAL TIME FOR THIS SEMINAR
Language models work well for a small number of languages. For the other languages, the best existing language model is likely multilingual, still with the vast majority of the training data coming from English and a few “priority” languages. We show that in many cases, multilinguality leads to worse performance across many languages due to limited model capacity. We then train a suite of over 1,000 monolingual models for 350 languages, finding that these models can outperform multilingual models over ten times their size. However, multilinguality can also be a blessing: we train a small number of controlled bilingual models in order to study how crosslingual transfer happens. We aim to better understand transfer learning in order to better leverage multilinguality to improve language model performance for all languages.
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Computer Education Research
- Computing Education Research
- Department of Computer Science and Technology talks and seminars
- Graduate-Seminars
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- ndk22's list
- NLIP Seminar Series
- ob366-ai4er
- ONLINE ONLY. Here is the Zoom link: https://cam-ac-uk.zoom.us/j/4751389294?pwd=Z2ZOSDk0eG1wZldVWG1GVVhrTzFIZz09
- PMRFPS's
- rp587
- School of Technology
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Catherine Arnett and Tyler Chang (EleutherAI and UC San Diego)
Friday 06 June 2025, 15:00-16:00