University of Cambridge > Talks.cam > NLIP Seminar Series > AI Metrics: Theoretical Foundations, Design, and Selection of Evaluation Metrics Based on Ground Truth

AI Metrics: Theoretical Foundations, Design, and Selection of Evaluation Metrics Based on Ground Truth

Download to your calendar using vCal

If you have a question about this talk, please contact Suchir Salhan .

In this talk (based on a book draft, see this link) I propose a unified formal framework for ground truth based evaluation metrics and task characterization grounded in measurement theory. Building on this foundation, I analyze the formal properties of existing metrics and organize them into families according to task characteristics. The book covers a wide range of discriminative tasks, including classification, ranking, clustering, and sequence labelling, among others, as well as text generation. It also provides practical guidance for selecting appropriate metrics depending on the evaluation scenario, together with a unified software framework that implements metrics across multiple tasks. Finally, the book extends evaluation beyond effectiveness to additional dimensions of AI quality, such as harmfulness, bias and fairness, explainability, and the assessment of cognitive capabilities.

Bio: Enrique Amigó is an Assistant Professor at the National University of Distance Education (UNED, Spain) and a member of UNED ’s Natural Language Processing and Information Retrieval group. His main research interests include evaluation metrics, document similarity, representation, and the connections between Information Access, Measurement, Information Theory, and cognitive science. He has received more than 3,000 citations, in most cases as first author. He has also participated in numerous research projects at the regional, national, and international levels.

This talk is part of the NLIP Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity