BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Explaining Neural Networks: Post-hoc and Natural Language Explanat
 ions - Oana Camburu\, University of Oxford
DTSTART:20191025T100000Z
DTEND:20191025T110000Z
UID:TALK133543@talks.cam.ac.uk
CONTACT:Robert Peharz
DESCRIPTION:In this talk\, we discuss two paradigms of explainability in n
 eural networks: post-hoc explanations and neural networks generating natur
 al language explanations for their own decisions. For the first paradigm\,
  we present two issues of existing post-hoc explanatory methods. The first
  issue is that two prevalent perspectives on explanations—feature-additi
 vity and feature-selection—lead to fundamentally different instance-wise
  explanations. In the literature\, explainers from different perspectives 
 are currently being directly compared\, despite their distinct explanation
  goals. The second issue is that current post-hoc explainers have only bee
 n thoroughly validated on simple models\, such as linear regression\, and\
 , when applied to real-world neural networks\, explainers are commonly eva
 luated under the assumption that the learned models behave reasonably. How
 ever\, neural networks often rely on unreasonable correlations\, even when
  producing correct decisions. We introduce a verification framework for ex
 planatory methods under the feature-selection perspective. Our framework i
 s\, to our knowledge\, the first evaluation test based on a non-trivial re
 al-world neural network for which we are able to provide guarantees on its
  inner workings. We show several failure modes of current explainers\, suc
 h as LIME\, SHAP and L2X. (based on https://arxiv.org/abs/1910.02065) For 
 the paradigm of neural networks that explain their own decisions in natura
 l language\, we introduce a large dataset of human-annotated explanations 
 for the ground-truth relations of SNLI\, which we call e-SNLI. The corpus 
 contains 570K instances\, being\, to our knowledge\, the largest dataset o
 f free-form natural language explanations. We present a series of models t
 rained on e-SNLI. (based on https://papers.nips.cc/paper/8163-e-snli-natur
 al-language-inference-with-natural-language-explanations.pdf) Finally\, we
  show that this class of models is prone to outputting inconsistent explan
 ations\, such as "A dog is an animal" and "A dog is not an animal"\, which
  are likely to decrease users' trust in these systems. To detect such inco
 nsistencies\, we introduce a simple but effective adversarial framework fo
 r generating a complete target sequence\, a scenario that has not been add
 ressed so far. Finally\, we apply our framework to the best model trained 
 on e-SNLI\, and we show that this model is capable of generating a signifi
 cant amount of inconsistencies. (based on https://arxiv.org/abs/1910.03065
 )
LOCATION:Engineering Department\, CBL Room BE-438.
END:VEVENT
END:VCALENDAR