BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Linear Attention for Efficient Transformers - Isaac Reid (Universi
 ty of Cambridge)
DTSTART:20241030T110000Z
DTEND:20241030T123000Z
UID:TALK223987@talks.cam.ac.uk
CONTACT:Xianda Sun
DESCRIPTION:Attention may be all you need\, but that doesn't mean it comes
  cheap. The Achilles' Heel of the wildly successful Transformer architectu
 re is its quadratic time- and space-complexity scaling with respect to the
  length of the input token sequence. A diverse taxonomy of methods has bee
 n proposed to remedy this bottleneck and recover linear complexity\, inclu
 ding making attention local\, sparse or low rank. We will explore the resp
 ective strengths and weaknesses of these approaches\, discuss theoretical 
 guarantees (or the lack thereof)\, and consider possible directions for fu
 ture work.\n\nSuggested reading:\n# Attention is all you need (https://arx
 iv.org/abs/1706.03762). Seminal Transformers paper.\n# Transformers are RN
 Ns: Fast Autoregressive Transformers with Linear Attention (https://arxiv.
 org/abs/2006.16236). Among the first papers on low-rank attention.\n# Swin
  Transformer: Hierarchical Vision Transformer using Shifted Windows (https
 ://arxiv.org/abs/2103.14030). Popular example of local attention.\n# Big B
 ird: Transformers for Longer Sequences (https://arxiv.org/abs/2007.14062).
  Example of the benefits of using a combination of techniques. 
LOCATION:Cambridge University Engineering Department\, CBL Seminar room BE
 4-38.
END:VEVENT
END:VCALENDAR
