Leveraging non-expert semantic intuitions to support multilingual NLP
- π€ Speaker: Olga Majewska, PhD student in Computational Linguistics, Faculty of Modern and Medieval Languages and Linguistics
- π Date & Time: Thursday 14 November 2019, 13:00 - 14:30
- π Venue: Room 326, Raised Faculty Building, MML, Sidgwick Avenue, Cambridge, CB3 9DA
Abstract
The recent advances in Natural Language Processing have greatly increased the capacity of automatic systems to understand human language. However, they rely on the availability of large quantities of data, and still struggle with many language-related tasks which humans perform intuitively on an everyday basis. Making subtle meaning distinctions requires rich, fine-grained lexical-semantic and conceptual knowledge. Although automatic lexical acquisition systems promise to overcome the challenge of creating deep lexical resources manually from scratch, they depend on the availability of gold standard datasets for evaluation purposes, which is still very limited in most languages of the world. Verbs pose a particular challenge for NLP systems due to their complex linguistic properties. Acting as sentence pivots, they encode crucial information about the structural and semantic relationships between the elements of the clause. This is why accurate, nuanced analysis and representation of their meaning is especially important for NLP systems to get closer to human levels of language understanding.
Fast but reliable creation of semantic resources could boost and support multilingual NLP , eliminating the bottleneck of resource scarcity in the majority of the world’s languages, and this project aims to facilitate this by developing methodology designed to speed up the resource creation process and allow its unlimited extension to diverse languages. It explores methods for obtaining verb classifications alternative to manual lexicographic work by leveraging semantic intuitions of non-expert native speakers. By examining humans’ complex, intuitive word similarity judgments in different languages and encoding them in computer-readable form, the study explores how meaning relations are organised in the semantic space in the brain and provides insights to support further development of representation learning models and their ability to capture fine-grained semantic distinctions present in the mental lexicon.
Series This talk is part of the MEITS Multilingualism Seminars series.
Included in Lists
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Guy Emerson's list
- MEITS Multilingualism Seminars
- Room 326, Raised Faculty Building, MML, Sidgwick Avenue, Cambridge, CB3 9DA
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Olga Majewska, PhD student in Computational Linguistics, Faculty of Modern and Medieval Languages and Linguistics
Thursday 14 November 2019, 13:00-14:30