LLMs and Low-Resource Languages
- đ¤ Speaker: Eneko Agirre, University of the Basque Country (UPV/EHU)
- đ Date & Time: Wednesday 27 November 2024, 16:00 - 17:00
- đ Venue: GR04, English Faculty Building, 9 West Road, Sidgwick Site and online https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09
Abstract
Abstract: Generative AI models are now multilingual, raising new questions about their relative performance across languages and local cultures, specially for communities with less speakers. In this talk I will explore some of those questions and the lessons we learned along the process. Is it possible to build high-performing LLMs for low-resource languages? We have built a high performing open model for Basque accompanied by a fully reproducible end-to-end evaluation suite. Do LLMs think better in English than the local language? Our experiments show that LLMs do not fully exploit their multilingual potential when prompted in non-English languages. Do LLMs know about local culture? We probed the complex interaction between language and global/local knowledge, showing for the first time that local knowledge is transferred from the low-resource to the high-resource language, a sign that prior findings may not hold when evaluated on local topics. The evaluation suite was recognised with a best resource paper award at ACL 2024 .
Bio: Eneko Agirre is Full Professor of Informatics and Head of HiTZ Basque Center of Language Technology at the University of the Basque Country, UPV /EHU, in San Sebastian, Spain. Visiting researcher or professor at New Mexico State, Melbourne, Southern California, Stanford and New York Universities. He has been active in Natural Language Processing and Computational Linguistics since his undergraduate days. He received the Spanish Informatics Research Award in 2021, and is one of the 74 fellows of the Association of Computational Linguistics (ACL). He was President of ACL ’s SIGLEX , member of the editorial board of Computational Linguistics, Journal of Artificial Intelligence Research and Action Editor for the Transactions of the ACL . He is co-founder of the Joint Conference on Lexical and Computational Semantics (*SEM). He is a recipient of three Google Research Awards and six best paper awards and nominations, most recent at ACL 2024 . Dissertations under his supervision received best PhD awards by EurAI, the Spanish NLP society and the Spanish Informatics Scientific Association. He has over 200 publications across a wide range of NLP and AI topics, as well as having given more than 20 invited talks, mostly international.
Series This talk is part of the Language Technology Lab Seminars series.
Included in Lists
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- GR04, English Faculty Building, 9 West Road, Sidgwick Site
- GR04, English Faculty Building, 9 West Road, Sidgwick Site and online https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- Language Technology Lab Seminars
- ndk22's list
- ob366-ai4er
- rp587
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Eneko Agirre, University of the Basque Country (UPV/EHU)
Wednesday 27 November 2024, 16:00-17:00