BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Optimal Quantization for Matrix Multiplication - Or Ordentlich\, H
 ebrew University of Jerusalem
DTSTART:20251126T140000Z
DTEND:20251126T150000Z
UID:TALK234370@talks.cam.ac.uk
CONTACT:Dr Varun Jog
DESCRIPTION:The main building block of large language models is matrix mul
 tiplication\, which is often bottlenecked by the speed of loading these ma
 trices from memory. A possible solution is to trade accuracy for speed by 
 storing the matrices in low precision (“quantizing” them). In recent y
 ears a number of quantization algorithms with increasingly better performa
 nce were proposed (e.g.\, SmoothQuant\, Brain compression\, GPTQ\, QuIP\, 
 QuIP#\, QuaRot\, SpinQuant). In this work\, we prove an information theore
 tic lower bound on achievable accuracy of computing matrix product as a fu
 nction of compression rate (number of bits per matrix entry). We also cons
 truct a quantizer (based on nested lattices) achieving this lower bound. A
 pplying our nested lattice scheme for quantizing weights\, KV-cache\, and 
 activations of Llama-3-8B to 4 bits\, yields smaller perplexity than state
 -of-the-art quantization schemes.\n\nBased on joint work with Yury Polyans
 kiy\, and with Semyon Savkin and Eitan Porat. References: ["1":https://arx
 iv.org/pdf/2410.13780]\, ["2":https://arxiv.org/pdf/2502.09720]\n\n*Bio*: 
 Or Ordentlich is an associate professor in the School of Computer Science 
 and Engineering at the Hebrew University of Jerusalem. His research focuse
 s on information theory\, and its application to modern problems in commun
 ication\, compression and data science. Or received the B.Sc. (cum laude)\
 , M.Sc. (summa cum laude)\, and Ph.D. degrees from Tel Aviv University\, I
 srael\, in 2010\, in 2011\, and 2016\, respectively\, all in electrical en
 gineering. During the years 2015-2017 he was a postdoctoral fellow in the 
 Laboratory for Information and Decision Systems at the Massachusetts Insti
 tute of Technology (MIT)\, and in the Department of Electrical and Compute
 r Engineering at Boston University. He has been serving as an associate ed
 itor for Signal Processing and Source Coding in the IEEE Transactions on I
 nformation Theory since 2021. His work on lattice covering has received th
 e Frontiers of Science Award in 2025.
LOCATION:MR5\, CMS Pavilion A
END:VEVENT
END:VCALENDAR
