Robust Alignment of Large Language Models
- đ¤ Speaker: Dr. Sangwoong Yoon (UCL)
- đ Date & Time: Friday 23 May 2025, 12:00 - 13:00
- đ Venue: ONLINE ONLY. Here is the Zoom link: https://cam-ac-uk.zoom.us/j/4751389294?pwd=Z2ZOSDk0eG1wZldVWG1GVVhrTzFIZz09
Abstract
The alignment of large language models (LLMs) can often be brittle when faced with the complexities of real-world deployment. In this talk, I share our investigations on two scenarios where special care is required to ensure robust alignment.
The first scenario is multi-objective alignment, where balancing competing objectives is particularly challenging. Our recent work, Robust Multi-Objective Decoding (RMOD), an inference-time alignment algorithm, adaptively adjusts the weights of different objectives during response generation to ensure none are neglected. RMOD provides principled robustness with minimal overhead, consistently outperforming existing methods across several alignment benchmarks.
In the second part of the talk, I will address preference model misspecification in self-play alignment. While self-play is a promising alignment approach, naive implementations are vulnerable to inaccuracies in the preference model. To address this, our Regularized Self-Play Policy Optimization (RSPO) framework offers a versatile and modular method for regularizing the self-play alignment process. RSPO âs ability to combine various regularizers results in strong performance gains on multiple evaluation sets, such as AlpacaEval-2 and Arena-Hard.
As a bonus, I will briefly introduce our recent investigation into the robustness of Mixture-of-Agent (MoA) systems, a popular multi-agent paradigm. We show that even a single malicious agent introduced into the mixture can nullify the benefits of the entire system.
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Computer Education Research
- Computing Education Research
- Department of Computer Science and Technology talks and seminars
- Graduate-Seminars
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- ndk22's list
- NLIP Seminar Series
- ob366-ai4er
- ONLINE ONLY. Here is the Zoom link: https://cam-ac-uk.zoom.us/j/4751389294?pwd=Z2ZOSDk0eG1wZldVWG1GVVhrTzFIZz09
- PMRFPS's
- rp587
- School of Technology
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Dr. Sangwoong Yoon (UCL)
Friday 23 May 2025, 12:00-13:00