BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:CodeScaler: Scaling Code LLM Training and Test-Time Inference via 
 Execution-Free Reward Models - Zhijiang Guo (HKUST (GZ) | HKUST)
DTSTART:20260417T110000Z
DTEND:20260417T120000Z
UID:TALK242725@talks.cam.ac.uk
CONTACT:Suchir Salhan
DESCRIPTION: In this talk\, I will present CodeScaler\, a novel framework 
 designed to overcome the scalability bottlenecks of Reinforcement Learning
  from Verifiable Rewards (RLVR) in code generation. While traditional RLVR
  relies heavily on the availability of high-quality unit tests—which are
  often scarce or unreliable—CodeScaler introduces an execution-free rewa
 rd model that scales both training and test-time inference. By leveraging 
 carefully curated preference data\, syntax-aware code extraction\, and val
 idity-preserving reward shaping\, CodeScaler achieves significant performa
 nce gains\, improving the Qwen3-8B-Base model by an average of +11.72 poin
 ts across five benchmarks. Furthermore\, CodeScaler functions as a highly 
 efficient test-time scaling method\, delivering performance comparable to 
 execution-based approaches while reducing latency by 10$\\times$. I will d
 iscuss how this approach enables robust optimization on synthetic datasets
  without the need for test cases and its broader implications for enhancin
 g reasoning capabilities in general domains.
LOCATION:ONLY ONLY. Here is the Google Meet Link: https://meet.google.com/
 cru-hcuo-rhu
END:VEVENT
END:VCALENDAR
