BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Challenges in evaluating natural language generation systems - Moh
 it Iyyer (University of Massachusetts Amherst)
DTSTART:20210611T120000Z
DTEND:20210611T130000Z
UID:TALK160510@talks.cam.ac.uk
CONTACT:Huiyuan Xie
DESCRIPTION:Join Zoom Meeting \nhttps://cl-cam-ac-uk.zoom.us/j/91900396241
 ?pwd=Wk5mcDYrUytkSElkMHB0T3NkNkRFQT09 \n\nMeeting ID: 919 0039 6241 \nPass
 code: 127570 \n\nRecent advances in neural language modeling have opened u
 p a variety of exciting new text generation applications. However\, evalua
 ting systems built for these tasks remains difficult. Most prior work reli
 es on a combination of automatic metrics such as BLEU (which are often uni
 nformative) and crowdsourced human evaluation (which are also usually unin
 formative\, especially when conducted without careful task design). In thi
 s talk\, I focus on two specific applications: (1) unsupervised sentence-l
 evel style transfer and (2) long-form question answering. I will go over o
 ur recent work on building models for these systems and then describe the 
 ensuing struggles to properly compare them to baselines. In both cases\, w
 e identify (and propose solutions for) issues with existing evaluations\, 
 including improper aggregation of multiple metrics\, missing control exper
 iments with simple baselines\, and high cognitive load placed on human eva
 luators. I'll conclude by briefly discussing our work on machine-in-the-lo
 op text generation systems\, in which both humans and machines participate
  in the generation process\, where reliable human evaluation becomes much 
 more feasible. 
LOCATION:Virtual (Zoom)
END:VEVENT
END:VCALENDAR
