Machine Learning is Linear Algebra
- đ¤ Speaker: Andrew Gordon Wilson - New York University đ Website
- đ Date & Time: Thursday 13 February 2025, 16:00 - 17:00
- đ Venue: https://cam-ac-uk.zoom.us/j/81897609356?pwd=HqbUQWnASjpBBZdaZo9r43M9Gj4N3Q.1
Abstract
I will talk about how modelling assumptions manifest themselves as algebraic structure in a variety of settings, including optimization, attention, and network parameters, and how we can algorithmically exploit that structure for better scaling laws with transformers. As part of this effort, I will present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. This framework encompasses previously proposed structures, such as low-rank, Kronecker, Tensor-Train, and Monarch, along with many novel structures. We develop a taxonomy of all such operators based on their computational and algebraic properties, which provides insights into their compute-optimal scaling laws. Combining these insights with empirical evaluation, we identify a subset of structures that achieve better performance than dense layers as a function of training compute, which we then develop into a high-performance sparse mixture-of-experts layer.
Series This talk is part of the Cambridge Ellis Unit series.
Included in Lists
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Ellis Unit
- Cambridge talks
- Chris Davis' list
- Hanchen DaDaDash
- https://cam-ac-uk.zoom.us/j/81897609356?pwd=HqbUQWnASjpBBZdaZo9r43M9Gj4N3Q.1
- Information Engineering Division seminar list
- Interested Talks
- ndk22's list
- ob366-ai4er
- rp587
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)



Thursday 13 February 2025, 16:00-17:00