BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Adaptive Resource Allocation for Low-Latency LLM Serving in Dynami
 c Environments - Masayuki Usui and Shinya Takamaeda-Yamazaki (University o
 f Tokyo)
DTSTART:20250901T130000Z
DTEND:20250901T140000Z
UID:TALK235336@talks.cam.ac.uk
CONTACT:Eiko Yoneki
DESCRIPTION:Large language models (LLMs) face significant challenges in ac
 hieving low-latency inference. Techniques such as speculative decoding and
  chunked prefill can help reduce latency\, but their effectiveness depends
  heavily on algorithmic parameters that are sensitive to fluctuating syste
 m conditions. As a result\, static parameter settings often lead to subopt
 imal performance under dynamic workloads. To address this issue\, we propo
 se dynamic parameter optimization methods that adapt to evolving environme
 nts to maximize performance. In this talk\, we present the technical detai
 ls of these methods along with initial evaluation results.\n\nBio:\nMasayu
 ki Usui received his bachelor's and master's degrees in computer science f
 rom the University of Tokyo\, Japan. He is currently pursuing a Ph.D. degr
 ee at the University of Tokyo. His research interests include LLM inferenc
 e serving and computer architecture.\n\nShinya Takamaeda-Yamazaki received
  his B.E.\, M.E.\, and D.E. degrees from the Tokyo Institute of Technology
 \, Japan\, in 2009\, 2011\, and 2014\, respectively. Since 2019\, he has b
 een an Associate Professor at the University of Tokyo\, Japan. In 2025\, h
 e also became a Team Leader at RIKEN AIP\, Japan. His research interests i
 nclude computer architecture\, hardware design technologies\, and machine 
 learning systems.\n
LOCATION:SS03
END:VEVENT
END:VCALENDAR
