BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:High-dimensional variable selection when features are sparse - Jac
 ob Bien (University of Southern California)
DTSTART:20180426T100000Z
DTEND:20180426T110000Z
UID:TALK104332@talks.cam.ac.uk
CONTACT:INI IT
DESCRIPTION:It is common in modern prediction problems for many predictor 
 variables to be counts of rarely occurring events. This leads to design ma
 trices in which a large number of columns are highly sparse. The challenge
  posed by such "rare features" has received little attention despite its p
 revalence in diverse areas\, ranging from biology (e.g.\, rare species) to
  natural language processing (e.g.\, rare words). We show\, both theoretic
 ally and empirically\, that not explicitly accounting for the rareness of 
 features can greatly reduce the effectiveness of an analysis. We next prop
 ose a framework for aggregating rare features into denser features in a fl
 exible manner that creates better predictors of the response. An applicati
 on to online hotel reviews demonstrates the gain in accuracy achievable by
  proper treatment of rare words. This is joint work with Xiaohan Yan.  <br
 ><br><br><br>
LOCATION:Seminar Room 2\, Newton Institute
END:VEVENT
END:VCALENDAR
