University of Cambridge > Talks.cam > Systems Research Talk Series (temporal) > Adaptive Resource Allocation for Low-Latency LLM Serving in Dynamic Environments

Log in

Google

Microsoft

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Adaptive Resource Allocation for Low-Latency LLM Serving in Dynamic Environments

Download to your calendar using vCal

Masayuki Usui and Shinya Takamaeda-Yamazaki (University of Tokyo)
Monday 01 September 2025, 14:00-15:00
SS03.

If you have a question about this talk, please contact Eiko Yoneki .

Large language models (LLMs) face significant challenges in achieving low-latency inference. Techniques such as speculative decoding and chunked prefill can help reduce latency, but their effectiveness depends heavily on algorithmic parameters that are sensitive to fluctuating system conditions. As a result, static parameter settings often lead to suboptimal performance under dynamic workloads. To address this issue, we propose dynamic parameter optimization methods that adapt to evolving environments to maximize performance. In this talk, we present the technical details of these methods along with initial evaluation results.

Bio: Masayuki Usui received his bachelor’s and master’s degrees in computer science from the University of Tokyo, Japan. He is currently pursuing a Ph.D. degree at the University of Tokyo. His research interests include LLM inference serving and computer architecture.

Shinya Takamaeda-Yamazaki received his B.E., M.E., and D.E. degrees from the Tokyo Institute of Technology, Japan, in 2009, 2011, and 2014, respectively. Since 2019, he has been an Associate Professor at the University of Tokyo, Japan. In 2025, he also became a Team Leader at RIKEN AIP , Japan. His research interests include computer architecture, hardware design technologies, and machine learning systems.

This talk is part of the Systems Research Talk Series (temporal) series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Adaptive Resource Allocation for Low-Latency LLM Serving in Dynamic Environments

📅 Download to calendar (vCal)

👤 Speaker: Masayuki Usui and Shinya Takamaeda-Yamazaki (University of Tokyo)
📅 Date & Time: Monday 01 September 2025, 14:00 - 15:00
📍 Venue: SS03

Questions? Contact Eiko Yoneki

Abstract

Series This talk is part of the Systems Research Talk Series (temporal) series.

Included in Lists

Note: Ex-directory lists are not shown.

Log in

🔐 Log In

Information on

ℹ️ Information

Adaptive Resource Allocation for Low-Latency LLM Serving in Dynamic Environments

This talk is included in these lists:

Adaptive Resource Allocation for Low-Latency LLM Serving in Dynamic Environments

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

Adaptive Resource Allocation for Low-Latency LLM Serving in Dynamic Environments

This talk is included in these lists:

Other lists

Other talks

Adaptive Resource Allocation for Low-Latency LLM Serving in Dynamic Environments

Abstract

Included in Lists