Machine Learning is Linear Algebra
- đ¤ Speaker: Andrew Gordon Wilson - New York University đ Website
- đ Date & Time: Thursday 13 February 2025, 16:00 - 17:00
- đ Venue: https://cam-ac-uk.zoom.us/j/81897609356?pwd=HqbUQWnASjpBBZdaZo9r43M9Gj4N3Q.1
Abstract
I will talk about how modelling assumptions manifest themselves as algebraic structure in a variety of settings, including optimization, attention, and network parameters, and how we can algorithmically exploit that structure for better scaling laws with transformers. As part of this effort, I will present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. This framework encompasses previously proposed structures, such as low-rank, Kronecker, Tensor-Train, and Monarch, along with many novel structures. We develop a taxonomy of all such operators based on their computational and algebraic properties, which provides insights into their compute-optimal scaling laws. Combining these insights with empirical evaluation, we identify a subset of structures that achieve better performance than dense layers as a function of training compute, which we then develop into a high-performance sparse mixture-of-experts layer.
Talks.cam link – https://talks.cam.ac.uk/talk/index/228163
Series This talk is part of the Machine Learning is Linear Algebra series.
Included in Lists
- https://cam-ac-uk.zoom.us/j/81897609356?pwd=HqbUQWnASjpBBZdaZo9r43M9Gj4N3Q.1
- Machine Learning is Linear Algebra
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)



Thursday 13 February 2025, 16:00-17:00