BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Does AI help humans make better decisions? A statistical evaluatio
 n framework for experimental and observational studies. - Kosuke Imai (Har
 vard University)
DTSTART:20250516T130000Z
DTEND:20250516T140000Z
UID:TALK231520@talks.cam.ac.uk
CONTACT:Qingyuan Zhao
DESCRIPTION:The use of Artificial Intelligence (AI)\, or more generally da
 ta-driven algorithms\, has become ubiquitous in today's society. Yet\, in 
 many cases and especially when stakes are high\, humans still make final d
 ecisions. The critical question\, therefore\, is whether AI helps humans m
 ake better decisions compared to a human-alone or AI-alone system. We intr
 oduce a new methodological framework to empirically answer this question w
 ith a minimal set of assumptions. We measure a decision maker's ability to
  make correct decisions using standard classification metrics based on the
  baseline potential outcome. We consider a single-blinded and unconfounded
  treatment assignment\, where the provision of AI-generated recommendation
 s is assumed to be randomized across cases with humans making final decisi
 ons. Under this study design\, we show how to compare the performance of t
 hree alternative decision-making systems --- human-alone\, human-with-AI\,
  and AI-alone. Importantly\, the AI-alone system includes any individualiz
 ed treatment assignment\, including those that are not used in the origina
 l study. We also show when AI recommendations should be provided to a huma
 n-decision maker\, and when one should follow such recommendations. We app
 ly the proposed methodology to our own randomized controlled trial evaluat
 ing a pretrial risk assessment instrument. We find that the risk assessmen
 t recommendations do not improve the classification accuracy of a judge's 
 decision to impose cash bail. Furthermore\, we find that replacing a human
  judge with algorithms --- the risk assessment score and a large language 
 model in particular --- leads to a worse classification performance.
LOCATION:MR12\, Centre for Mathematical Sciences
END:VEVENT
END:VCALENDAR