BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Discussion on Causal Representation Learning with Generative Artif
 icial Intelligence: Application to Texts as Treatments - Martina Scauda (U
 niversity of Cambridge)
DTSTART:20250307T153000Z
DTEND:20250307T170000Z
UID:TALK229162@talks.cam.ac.uk
CONTACT:Martina Scauda
DESCRIPTION:See preprint by Kosuke Imai and Kentaro Nakamura at : https://
 arxiv.org/abs/2410.00903\n\nIn this paper\, we demonstrate how to enhance 
 the validity of causal inference with unstructured high-dimensional treatm
 ents like texts\, by leveraging the power of generative Artificial Intelli
 gence. Specifically\, we propose to use a deep generative model such as la
 rge language models (LLMs) to efficiently generate treatments and use thei
 r internal representation for subsequent causal effect estimation. We show
  that the knowledge of this true internal representation helps disentangle
  the treatment features of interest\, such as specific sentiments and cert
 ain topics\, from other possibly unknown confounding features. Unlike the 
 existing methods\, our proposed approach eliminates the need to learn caus
 al representation from the data and hence produces more accurate and effic
 ient estimates. We formally establish the conditions required for the nonp
 arametric identification of the average treatment effect\, propose an esti
 mation strategy that avoids the violation of the overlap assumption\, and 
 derive the asymptotic properties of the proposed estimator through the app
 lication of double machine learning. Finally\, using an instrumental varia
 bles approach\, we extend the proposed methodology to the settings\, in wh
 ich the treatment feature is based on human perception rather than is assu
 med to be fixed given the treatment object. The proposed methodology is al
 so applicable to text reuse where an LLM is used to regenerate the existin
 g texts. We conduct simulation and empirical studies\, using the generated
  text data from an open-source LLM\, Llama 3\, to illustrate the advantage
 s of our estimator over the state-of-the-art causal representation learnin
 g algorithms.
LOCATION:MR12\,  Centre for Mathematical Sciences\, Wilberforce Road\, Cam
 bridge
END:VEVENT
END:VCALENDAR
