BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:The unreasonable effectiveness of mathematics in large scale deep 
 learning - Greg Yang\, Microsoft Research
DTSTART:20220706T100000Z
DTEND:20220706T113000Z
UID:TALK176330@talks.cam.ac.uk
CONTACT:James Allingham
DESCRIPTION:Recently\, the theory of infinite-width neural networks led to
  the first technology\, muTransfer\, for tuning enormous neural networks t
 hat are too expensive to train more than once. For example\, this allowed 
 us to tune the 6.7 billion parameter version of GPT-3 using only 7% of its
  pretraining compute budget\, and with some asterisks\, we get a performan
 ce comparable to the original GPT-3 model with twice the parameter count. 
 In this talk\, I will explain the core insight behind this theory. In fact
 \, this is an instance of what I call the *Optimal Scaling Thesis*\, which
  connects infinite-size limits for general notions of “size” to the op
 timal design of large models in practice\, illustrating a way for theory t
 o reliably guide the future of AI. I’ll end with several concrete key ma
 thematical research questions whose resolutions will have incredible impac
 t on how practitioners scale up their NNs.\n\nThere’s no required readin
 g for the talk but folks can look at my "homepage":https://www.microsoft.c
 om/en-us/research/people/gregyang/ for an overview of Tensor Programs.\n\n
 \n\n
LOCATION:Cambridge University Engineering Department\, CBL Seminar room BE
 4-38
END:VEVENT
END:VCALENDAR