University of Cambridge > Talks.cam > NLIP Seminar Series > CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

Log in

Google

Microsoft

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

Download to your calendar using vCal

Zhijiang Guo (HKUST (GZ) | HKUST)
Friday 17 April 2026, 12:00-13:00
ONLY ONLY. Here is the Google Meet Link: https://meet.google.com/cru-hcuo-rhu.

If you have a question about this talk, please contact Suchir Salhan .

In this talk, I will present CodeScaler, a novel framework designed to overcome the scalability bottlenecks of Reinforcement Learning from Verifiable Rewards (RLVR) in code generation. While traditional RLVR relies heavily on the availability of high-quality unit tests—which are often scarce or unreliable—CodeScaler introduces an execution-free reward model that scales both training and test-time inference. By leveraging carefully curated preference data, syntax-aware code extraction, and validity-preserving reward shaping, CodeScaler achieves significant performance gains, improving the Qwen3-8B-Base model by an average of +11.72 points across five benchmarks. Furthermore, CodeScaler functions as a highly efficient test-time scaling method, delivering performance comparable to execution-based approaches while reducing latency by 10$\times$. I will discuss how this approach enables robust optimization on synthetic datasets without the need for test cases and its broader implications for enhancing reasoning capabilities in general domains.

This talk is part of the NLIP Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

📅 Download to calendar (vCal)

👤 Speaker: Zhijiang Guo (HKUST (GZ) | HKUST)
📅 Date & Time: Friday 17 April 2026, 12:00 - 13:00
📍 Venue: ONLY ONLY. Here is the Google Meet Link: https://meet.google.com/cru-hcuo-rhu

Questions? Contact Suchir Salhan

Abstract

Series This talk is part of the NLIP Seminar Series series.

Included in Lists

Note: Ex-directory lists are not shown.

Log in

🔐 Log In

Information on

ℹ️ Information

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

This talk is included in these lists:

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

This talk is included in these lists:

Other lists

Other talks

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

Abstract

Included in Lists