Contextualized embeddings for lexical semantics
- đ¤ Speaker: Katrin Erk, University of Texas at Austin
- đ Date & Time: Thursday 20 October 2022, 15:00 - 16:00
- đ Venue: https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09
Abstract
Word embeddings are dense vector representations computed automatically from large amounts of text. From a lexical semantics perspective, we can view an embedding as a compact aggregate of many observed word uses, from different speakers. Especially contextualized word embeddings are highly interesting for lexical semantics because they give us a potential window into garden-variety polysemy: polysemy that is entirely idiosyncratic, not regular. But there is not yet a standardized way to use contextualized embeddings for lexical semantics. I report on two studies we have been doing. In the first, we tested the use of word token clusters on the task of type-level similarity. In the second, we are mapping word token embeddings to human-readable features. I also comment on a trend in word embeddings, from count-based embeddings to the most recent contextualized embeddings, to pick up on what could be called traces of stories: text topics, judgments and sentiment, and cultural trends. I argue that this is actually an interesting signal and not a bug.
Series This talk is part of the Language Technology Lab Seminars series.
Included in Lists
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Guy Emerson's list
- https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09
- Interested Talks
- Language Sciences for Graduate Students
- Language Technology Lab Seminars
- ndk22's list
- ob366-ai4er
- rp587
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Thursday 20 October 2022, 15:00-16:00