BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:On the Consistency of Supervised Learning with Missing Values - Ju
 lie Josse\, École Polytechnique
DTSTART:20190510T150000Z
DTEND:20190510T160000Z
UID:TALK115933@talks.cam.ac.uk
CONTACT:Dr Sergio Bacallado
DESCRIPTION:In many application settings\, the data have missing features 
 which make data analysis challenging. An abundant literature addresses mis
 sing data in an inferential framework: estimating parameters and their var
 iance from incomplete tables. Here\, we consider supervised-learning setti
 ngs: predicting a target when missing values appear in both training and t
 esting data. We show the consistency of two approaches in prediction. A st
 riking result is that the widely-used method of imputing with the mean pri
 or to learning is consistent when missing values are not informative. This
  contrasts with inferential settings where mean imputation is pointed at f
 or distorting the distribution of the data. That such a simple approach ca
 n be consistent is important in practice. We analyze further decision tree
 s. These can naturally tackle empirical risk minimization with missing val
 ues\, due to their ability to handle the half-discrete nature of incomplet
 e variables. After comparing theoretically and empirically different missi
 ng values strategies in trees\, we recommend using the "missing incorporat
 ed in attribute" method as it can handle both non-informative and informat
 ive missing values.
LOCATION:MR12
END:VEVENT
END:VCALENDAR
