Explanations as a Catalyst: Leveraging Large Language Models to Embrace Human Label Variation
- 👤 Speaker: Beiduo Chen
- 📅 Date & Time: Friday 10 October 2025, 11:00 - 12:00
- 📍 Venue: GR03, English Faculty Building, 9 West Road, Sidgwick Site and online https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09
Abstract
Abstract:
Human label variation (HLV)—the phenomenon where multiple annotators provide different yet valid labels for the same data—is a rich source of information often dismissed as noise. Capturing this variation is crucial for building robust NLP systems, but doing so is typically resource-intensive. This talk presents a series of studies on how Large Language models (LLMs) can serve as a catalyst to embrace and model HLV , moving from scalable approximation to a deeper analysis of the reasoning process itself.
First, I will discuss how LLMs can approximate full Human Judgment Distributions (HJDs) from just a few human-provided explanations. Our work shows that this explanation-based approach significantly improves alignment with human judgments. This investigation also reveals the limitations of traditional, instance-level distribution metrics and highlights the importance of complementing them with global-level measures to more effectively evaluate alignment.
Building on this, the second part of the talk addresses the high cost of collecting human explanations by asking: can LLM -generated explanations serve as a viable proxy? We demonstrate that when guided by a few human labels, explanations generated by LLMs are indeed effective proxies, achieving comparable performance to human-written ones in approximating HJDs. This finding opens up a scalable and efficient pathway for modeling HLV , especially for datasets where human explanations are not available.
Finally, I will shift from post-hoc explanation (justifying a given answer) to a forward-reasoning paradigm. I will introduce CoT2EL, a novel pipeline that extracts explanation-label pairs directly from an LLM ’s Chain-of-Thought (CoT) process before a final answer is selected. This method allows us to analyze the model’s reasoning across multiple plausible options. To better assess these nuanced judgments, I will also present a new rank-based evaluation framework that prioritizes the ordering of answers over exact distributional scores, showing a stronger alignment with human decision-making.
Bio
Beiduo Chen is a PhD student at the MaiNLP lab at LMU Munich, supervised by Prof. Barbara Plank. He is also a member of the European Laboratory for Learning and Intelligent Systems (ELLIS) PhD Program, co-supervised by Prof. Anna Korhonen at University of Cambridge. He received his Master’s and Bachelor’s degrees from the University of Science and Technology of China. His research focuses on human-centered NLP , with a special emphasis on the uncertainty, trustworthiness, and evaluation of Large Language Models. He has published several papers in top-tier NLP conferences, including ACL and EMNLP .
Series This talk is part of the Language Technology Lab Seminars series.
Included in Lists
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- GR03, English Faculty Building, 9 West Road, Sidgwick Site and online https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- Language Technology Lab Seminars
- ndk22's list
- ob366-ai4er
- rp587
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Beiduo Chen
Friday 10 October 2025, 11:00-12:00