BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Capability-oriented Evaluation in AI: From IRT to Measurement Layo
 uts - Prof Jose Hernandez-Orallo
DTSTART:20231121T160000Z
DTEND:20231121T173000Z
UID:TALK208381@talks.cam.ac.uk
CONTACT:Luning Sun
DESCRIPTION:With the advent of general-purpose systems in AI\, such as lar
 ge language models\, their evaluation is finally transitioning from the re
 porting of aggregate performance on some benchmarks to the extraction of c
 apabilities in more well-thought measurement experiments\, in a way that s
 hould resemble the theory and practice of psychological measurement. I wil
 l illustrate some examples where Factor Analysis and Item Response Theory 
 have been applied to AI evaluation in the past. In these psychometric appr
 oaches\, estimating capabilities excels over measuring performance in that
  capabilities aim to be independent from the task distribution. However\, 
 the parameters and factors in these models are still highly dependent on t
 he underlying population of AI systems\, which are more arbitrary and chan
 ging than human or animal populations. To address this issue\, we need a m
 ore cognitive\, intrinsic approach\, identifying task demands and mapping 
 the capabilities that can meet these demands. Under this perspective\, I w
 ill present a new approach referred to as 'measurement layouts'\, generali
 sed (non-linear) Hierarchical Bayesian Networks that can infer the latent 
 capabilities of a single AI system from observed performance and task dema
 nds\, and then predict performance for new tasks. Measurement layouts prov
 ide understanding of what makes an individual AI system fail and anticipat
 ion of performance for future tasks. At the end of the talk\, I'll invite 
 attendees to an open discussion on how measurement layouts compare to othe
 r novel approaches such as Assessors (performance models trained on test d
 ata) and more traditional approaches such as Structural Equation Modelling
  (if used for individuals).
LOCATION:S3.04\, Simon Sainsbury Centre\, Cambridge Judge Business School
END:VEVENT
END:VCALENDAR
