Capability-oriented Evaluation in AI: From IRT to Measurement Layouts
- đ¤ Speaker: Prof Jose Hernandez-Orallo đ Website
- đ Date & Time: Tuesday 21 November 2023, 16:00 - 17:30
- đ Venue: S3.04, Simon Sainsbury Centre, Cambridge Judge Business School
Abstract
With the advent of general-purpose systems in AI, such as large language models, their evaluation is finally transitioning from the reporting of aggregate performance on some benchmarks to the extraction of capabilities in more well-thought measurement experiments, in a way that should resemble the theory and practice of psychological measurement. I will illustrate some examples where Factor Analysis and Item Response Theory have been applied to AI evaluation in the past. In these psychometric approaches, estimating capabilities excels over measuring performance in that capabilities aim to be independent from the task distribution. However, the parameters and factors in these models are still highly dependent on the underlying population of AI systems, which are more arbitrary and changing than human or animal populations. To address this issue, we need a more cognitive, intrinsic approach, identifying task demands and mapping the capabilities that can meet these demands. Under this perspective, I will present a new approach referred to as ‘measurement layouts’, generalised (non-linear) Hierarchical Bayesian Networks that can infer the latent capabilities of a single AI system from observed performance and task demands, and then predict performance for new tasks. Measurement layouts provide understanding of what makes an individual AI system fail and anticipation of performance for future tasks. At the end of the talk, I’ll invite attendees to an open discussion on how measurement layouts compare to other novel approaches such as Assessors (performance models trained on test data) and more traditional approaches such as Structural Equation Modelling (if used for individuals).
Series This talk is part of the Cambridge Psychometrics Centre Seminars series.
Included in Lists
- Biology
- Cambridge Neuroscience Seminars
- Cambridge Psychometrics Centre Seminars
- Cambridge talks
- Chris Davis' list
- Department of Psychiatry talks stream
- dh539
- dh539
- Featured lists
- Life Science
- Life Sciences
- Neuroscience
- Neuroscience Seminars
- Neuroscience Seminars
- Psychology talks and events
- S3.04, Simon Sainsbury Centre, Cambridge Judge Business School
- Stem Cells & Regenerative Medicine
- Yishu's list
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Prof Jose Hernandez-Orallo 
Tuesday 21 November 2023, 16:00-17:30