BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Scalable Structural Inductive Biases in Neural Language Models - A
 dhiguna Kuncoro\, DeepMind
DTSTART:20210225T110000Z
DTEND:20210225T120000Z
UID:TALK157678@talks.cam.ac.uk
CONTACT:Marinela Parovic
DESCRIPTION:Scalable language models like BERT and GPT-3 have achieved rem
 arkable success in various natural language understanding benchmarks\, inc
 luding on challenging benchmarks of structural competence. Does this succe
 ss mean that data scale and large models are all we need to fully comprehe
 nd natural language? Or can these scalable models instead still benefit fr
 om more explicit structural inductive biases?\n\nThis talk provides eviden
 ce for the latter: We improve the performance of LSTM and Transformer mode
 ls by augmenting them with structural inductive biases derived from an exp
 licitly hierarchical---albeit harder to scale---recurrent neural network g
 rammars (RNNG). I will begin with an overview of the proposed structure di
 stillation objective for autoregressive language modelling with LSTMs. I w
 ill then discuss an extension to the masked language modelling case\, by d
 istilling the approximate posterior distributions of the RNNG teacher\, wh
 ich culminates in structure-distilled BERT models that outperform the stan
 dard BERT model on a diverse suite of structured prediction tasks.\n\nAlto
 gether\, these findings demonstrate the benefits of syntactic biases\, eve
 n in scalable language models that learn from large amounts of data\, and 
 contribute to a better understanding of where syntactic biases are most he
 lpful in benchmarks of natural language understanding.\n
LOCATION:https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBd
 XVpOXFvdz09
END:VEVENT
END:VCALENDAR