Doubt thy models: rethinking hypothesis testing in NLP
- π€ Speaker: Haim Dubossarsky (University of Cambridge)
- π Date & Time: Friday 31 January 2020, 12:00 - 13:00
- π Venue: SS03, Computer Laboratory
Abstract
Recent years have seen the rise of machine learning models in NLP research, which are applied inter alia, to research on questions motivated by linguistic theory. Indeed, it has now become relatively easy to model and to test research problems. The ease with which models can be deployed comes at the risk of careless use, which may potentially lead to unreliable findings and ultimately even hinder our ability to extend our knowledge. Such misuse may stem, for example, from unfamiliarity with the assumptions and hypotheses that are implicit to the models, or inherent confounds that demand experimental controls. In this talk, I will focus on problems that are specific to linguistically-motivated questions (e.g., semantic change), but also to classical NLP research more generally, (e.g., polysemy resolution and representation), where word embeddings are the prominent ML models. Major problems include biases induced by word frequency, similarity estimation of noisy word vector representations, and the evaluation of modelsβ performance in the absence of properly validated evaluation tasks in general. I will suggest ways to mitigate some of these problems, and share some ideas about performing valid scientific research in the age of all-to-easy modeling.
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
This talk is not included in any other list.
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Friday 31 January 2020, 12:00-13:00