BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Adaptive Resource Allocation for Low-Latency LLM Serving in Dynami
 c Environments -  Masayuki Usui and Shinya Takamaeda-Yamazaki (University 
 of Tokyo)
DTSTART:20250901T130000Z
DTEND:20250901T140000Z
UID:TALK235348@talks.cam.ac.uk
CONTACT:Eiko Yoneki
DESCRIPTION:Abstract:\nLarge language models (LLMs) face significant chall
 enges in achieving low-latency inference. Techniques such as speculative d
 ecoding and chunked prefill can help reduce latency\, but their effectiven
 ess depends heavily on algorithmic parameters that are sensitive to fluctu
 ating system conditions. As a result\, static parameter settings often lea
 d to suboptimal performance under dynamic workloads. To address this issue
 \, we propose dynamic parameter optimization methods that adapt to evolvin
 g environments to maximize performance. In this talk\, we present the tech
 nical details of these methods along with initial evaluation results.\n \n
 Bio:\nMasayuki Usui received his bachelor's and master's degrees in comput
 er science from the University of Tokyo\, Japan. He is currently pursuing 
 a Ph.D. degree at the University of Tokyo. His research interests include 
 LLM inference serving and computer architecture.\n \nShinya Takamaeda-Yama
 zaki received his B.E.\, M.E.\, and D.E. degrees from the Tokyo Institute 
 of Technology\, Japan\, in 2009\, 2011\, and 2014\, respectively. Since 20
 19\, he has been an Associate Professor at the University of Tokyo\, Japan
 . In 2025\, he also became a Team Leader at RIKEN AIP\, Japan. His researc
 h interests include computer architecture\, hardware design technologies\,
  and machine learning systems.
LOCATION:Computer Lab\, SS03
END:VEVENT
END:VCALENDAR
