BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:State space models (as alternatives to Transformers) - Yashar Ahma
 dian\; Qingyun Chen
DTSTART:20241126T133000Z
DTEND:20241126T150000Z
UID:TALK224956@talks.cam.ac.uk
CONTACT:124819
DESCRIPTION:The transformer architecture underlies recent breakthroughs in
  AI. Famously\, LLM’s such as the GPT series use this architecture\, but
  transformers have now taken over as the standard used across domains. Des
 pite their impressive track record\, however\, transformers also have unde
 sirable attributes. While performance in most tasks increases with the len
 gth of the context window\, the memory requirement and computational time 
 cost of transformers scale adversely with the context window length. Beyon
 d the issue of computational cost\, there are also reasons to think transf
 ormers may not have suboptimal inductive biases for certain sequential tas
 ks. These considerations have motivated the search for alternative archite
 ctures. The most promising among these are so-called State Space Models (S
 SM)\, which are recurrent architectures (which also brings them closer to 
 biological plausibility). We will start with a review of tansformers and m
 otivate their connection with SSM\, by reviewing the connection between co
 nvolutional vs state-space formulation of linear time-invariant systems. W
 e then cover the below papers on two specific SSM architectures and (time 
 allowing) a theoretical study of learning dynamics in linear SSM.\n1. http
 s://arxiv.org/abs/2312.00752 (Mamba: the currently popular SSM architectur
 e)\n2. https://arxiv.org/pdf/2410.01201 (miniGRU a minimal version of GRU 
 — a recently proposed bare-bones post-transformer architecture)\n3. http
 s://arxiv.org/pdf/2407.19115 (A theoretical study of gradient based learni
 ng in linear SSM)
LOCATION:CBL Seminar Room\, Engineering Department\, 4th floor Baker build
 ing
END:VEVENT
END:VCALENDAR