State space models (as alternatives to Transformers)
- đ¤ Speaker: Yashar Ahmadian; Qingyun Chen
- đ Date & Time: Tuesday 26 November 2024, 13:30 - 15:00
- đ Venue: CBL Seminar Room, Engineering Department, 4th floor Baker building
Abstract
The transformer architecture underlies recent breakthroughs in AI. Famously, LLM âs such as the GPT series use this architecture, but transformers have now taken over as the standard used across domains. Despite their impressive track record, however, transformers also have undesirable attributes. While performance in most tasks increases with the length of the context window, the memory requirement and computational time cost of transformers scale adversely with the context window length. Beyond the issue of computational cost, there are also reasons to think transformers may not have suboptimal inductive biases for certain sequential tasks. These considerations have motivated the search for alternative architectures. The most promising among these are so-called State Space Models (SSM), which are recurrent architectures (which also brings them closer to biological plausibility). We will start with a review of tansformers and motivate their connection with SSM , by reviewing the connection between convolutional vs state-space formulation of linear time-invariant systems. We then cover the below papers on two specific SSM architectures and (time allowing) a theoretical study of learning dynamics in linear SSM . 1. https://arxiv.org/abs/2312.00752 (Mamba: the currently popular SSM architecture) 2. https://arxiv.org/pdf/2410.01201 (miniGRU a minimal version of GRU â a recently proposed bare-bones post-transformer architecture) 3. https://arxiv.org/pdf/2407.19115 (A theoretical study of gradient based learning in linear SSM )
Series This talk is part of the Computational Neuroscience series.
Included in Lists
- All Talks (aka the CURE list)
- Biology
- Biology
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Neuroscience Seminars
- CamBridgeSens
- Cambridge talks
- CBL important
- CBL Seminar Room, Engineering Department, 4th floor Baker building
- Chris Davis' list
- Computational and Biological Learning Seminar Series
- Computational Neuroscience
- custom
- dh539
- dh539
- Featured lists
- Guy Emerson's list
- Hanchen DaDaDash
- Inference Group Journal Clubs
- Inference Group Summary
- Information Engineering Division seminar list
- Interested Talks
- Life Science
- Life Science Interface Seminars
- Life Sciences
- Life Sciences
- ME Seminar
- my_list
- ndk22's list
- Neuroscience
- Neuroscience Seminars
- Neuroscience Seminars
- ob366-ai4er
- other talks
- Quantum Matter Journal Club
- Required lists for MLG
- rp587
- se456's list
- Stem Cells & Regenerative Medicine
- TQS Journal Clubs
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Tuesday 26 November 2024, 13:30-15:00