BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Robust Alignment of Large Language Models - Dr. Sangwoong Yoon (UC
 L)
DTSTART:20250523T110000Z
DTEND:20250523T120000Z
UID:TALK229930@talks.cam.ac.uk
CONTACT:Suchir Salhan
DESCRIPTION:The alignment of large language models (LLMs) can often be bri
 ttle when faced with the complexities of real-world deployment. In this ta
 lk\, I share our investigations on two scenarios where special care is req
 uired to ensure robust alignment.\n\nThe first scenario is multi-objective
  alignment\, where balancing competing objectives is particularly challeng
 ing. Our recent work\, **Robust Multi-Objective Decoding (RMOD)\,** an inf
 erence-time alignment algorithm\, adaptively adjusts the weights of differ
 ent objectives during response generation to ensure none are neglected. RM
 OD provides principled robustness with minimal overhead\, consistently out
 performing existing methods across several alignment benchmarks.\n\nIn the
  second part of the talk\, I will address preference model misspecificatio
 n in self-play alignment. While self-play is a promising alignment approac
 h\, naive implementations are vulnerable to inaccuracies in the preference
  model. To address this\, our **Regularized Self-Play Policy Optimization 
 (RSPO)** framework offers a versatile and modular method for regularizing 
 the self-play alignment process. RSPO’s ability to combine various regul
 arizers results in strong performance gains on multiple evaluation sets\, 
 such as AlpacaEval-2 and Arena-Hard.\n\nAs a bonus\, I will briefly introd
 uce our recent investigation into the robustness of **Mixture-of-Agent (Mo
 A)** systems\, a popular multi-agent paradigm. We show that even a single 
 malicious agent introduced into the mixture can nullify the benefits of th
 e entire system.\n
LOCATION:ONLINE ONLY. Here is the Zoom link: https://cam-ac-uk.zoom.us/j/4
 751389294?pwd=Z2ZOSDk0eG1wZldVWG1GVVhrTzFIZz09
END:VEVENT
END:VCALENDAR
