BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Measuring Causal Effects of Data Statistics on Language Model Pred
 ictions - Yanai Elazar (Bar-Ilan University)
DTSTART:20220601T160000Z
DTEND:20220601T170000Z
UID:TALK175322@talks.cam.ac.uk
CONTACT:Michael Schlichtkrull
DESCRIPTION:Abstract: \n\nThe training data is one of the major reasons fo
 r state-of-the-art NLP models. But what exactly in the training data cause
 s a model to make a certain prediction? We seek to answer this research qu
 estion by formalizing it in a causal framework that provides a useful lang
 uage for investigating how training data influence predictions. Importantl
 y\, our causal framework bypasses the need to retrain expensive models and
  allows us to estimate causal effects based on observational data alone.  
 Addressing the problem of extracting factual knowledge from pretrained lan
 guage models (PLMs)\, we focus on simple data statistics: co-occurrences c
 ounts\, and show that these statistics influence the predictions of PLMs. 
 This establishes a causal link between simple statistics from the training
  data (co-occurrence counts) and PLMs' behavior\, and shows that their lan
 guage understanding is limited. Our causal framework and our results demon
 strate the importance of categorizing and studying datasets used for model
  training and the benefits of causality in our field for understanding NLP
  models.\n\nBio:\n\nYanai Elazar is a fourth-year PhD student at Bar-Ilan 
 University\, working with Prof. Yoav Goldberg on NLP. His main interests i
 nvolve model interpretation\, analysis\, biases in datasets and models\, a
 nd commonsense reasoning. Yanai was awarded multiple scholarships\, includ
 ing the PBC fellowship for outstanding PhD candidates in Data Science\, an
 d the Google PhD Fellowship.
LOCATION:Computer Lab\, FW26
END:VEVENT
END:VCALENDAR
