BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:The tradeoff governing efficient language model architectures - Sa
 bri Eyuboglu\, Stanford University
DTSTART:20240614T150000Z
DTEND:20240614T160000Z
UID:TALK213223@talks.cam.ac.uk
CONTACT:Richard Diehl Martinez
DESCRIPTION:Recent work has proposed alternative language model architectu
 res (e.g. RWKV\, Mamba\, Hyena) that are dramatically faster than Attentio
 n (e.g. 25x higher throughput).  However\, it’s unclear how switching to
  these new architectures might affect the behavior of language models when
  scaled up. In this talk\, we’ll discuss our recent work studying the fu
 ndamental tradeoffs that govern autoregressive language models. In particu
 lar\, we’ll focus on language model recall\, the ability to ground gener
 ations on information seen in-context\, which is critical for in-context l
 earning and copying. We show with theory and experiments that all autoregr
 essive architectures obey a fundamental tradeoff: the less memory the mode
 l consumes during inference\, the worse it is at recall. This tradeoff mat
 ters because memory consumption dictates language model throughput in prac
 tice. We propose a simple architecture called Based that combines linear a
 nd sliding window attention. By varying Based window size and linear atten
 tion feature dimension\, we can dial the model’s memory consumption and 
 traverse the Pareto frontier of the recall-memory tradeoff curve\, recover
 ing the full quality of attention on one end and the efficiency of the fas
 test attention alternatives on the other. \n\nBio:\n\nI'm a Fourth-Year CS
  PhD Student in the Stanford Machine Learning Group advised by Chris Ré a
 nd James Zou. I am supported by the National Science Foundation GRFP .\nI 
 like to develop a detailed understanding of how machine learning models wo
 rk and when they fail by exploring the unstructured data on which they are
  trained and formalizing sub-tasks with synthetics. Most recently\, I've b
 een working on understanding how neural network building blocks affect the
  quality and efficiency of foundation models. I also like to build tools t
 hat leverage large\, pre-trained models to facilitate the analysis and man
 agement of unstructured training and validation datasets. I'm motivated by
  challenges that arise when trying to apply machine learning in safety-cri
 tical settings like medicine and the sciences. Previously\, I was a machin
 e learning research intern at Flatiron Health. I completed my undergrad an
 d master's at Stanford\, where I worked with Jure Leskovec's SNAP Group an
 d the AIMI Center. 
LOCATION:Zoom link: https://cam-ac-uk.zoom.us/j/4751389294?pwd=Z2ZOSDk0eG1
 wZldVWG1GVVhrTzFIZz09
END:VEVENT
END:VCALENDAR
