University of Cambridge > Talks.cam > NLIP Seminar Series > Memorization as a Feature, Not a Bug

Memorization as a Feature, Not a Bug

Download to your calendar using vCal

If you have a question about this talk, please contact Suchir Salhan .

Memorization in LLMs has long been perceived as undesirable, associated with privacy risks, copyright concerns, and wasted capacity. In this talk, I argue for a complementary perspective: memorization is an intrinsic property of LLMs that can be leveraged to build a better LLM ecosystem. I first present two frameworks to rigorously study counterfactual memorization of a training run. I then demonstrate how memorization dynamics can be exploited to establish model and text provenance. Together, these results suggest a new perspective: rather than focusing on suppressing memorization, we should aim to understand and harness it. Doing so opens new avenues for provenance, tracing downstream impacts, and policies around intellectual property and integrity in the LLM ecosystem.

Speaker Bio: Jing Huang is a PhD candidate in the StanfordNLP Group, advised by Prof. Christopher Potts and Dr Diyi Yang. Jing’s research interests focus on understanding what makes neural network models generalize well by studying the causal mechanisms that connect model behaviors, internal representations, and training data.

This talk is part of the NLIP Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Š 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity