University of Cambridge > Talks.cam > NLIP Seminar Series > CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

Download to your calendar using vCal

If you have a question about this talk, please contact Suchir Salhan .

In this talk, I will present CodeScaler, a novel framework designed to overcome the scalability bottlenecks of Reinforcement Learning from Verifiable Rewards (RLVR) in code generation. While traditional RLVR relies heavily on the availability of high-quality unit tests—which are often scarce or unreliable—CodeScaler introduces an execution-free reward model that scales both training and test-time inference. By leveraging carefully curated preference data, syntax-aware code extraction, and validity-preserving reward shaping, CodeScaler achieves significant performance gains, improving the Qwen3-8B-Base model by an average of +11.72 points across five benchmarks. Furthermore, CodeScaler functions as a highly efficient test-time scaling method, delivering performance comparable to execution-based approaches while reducing latency by 10$\times$. I will discuss how this approach enables robust optimization on synthetic datasets without the need for test cases and its broader implications for enhancing reasoning capabilities in general domains.

This talk is part of the NLIP Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity