BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Multi-Label Learning with Millions of Categories - Manik Varma (Mi
 crosoft Research India)
DTSTART:20120924T100000Z
DTEND:20120924T110000Z
UID:TALK39299@talks.cam.ac.uk
CONTACT:Zoubin Ghahramani
DESCRIPTION:Our objective is to build an algorithm for classifying a data 
 point into a set of labels when the output space contains millions of cate
 gories. This is a relatively novel setting in supervised learning and brin
 gs forth interesting challenges such as efficient training and prediction\
 , learning from only positively labeled data with missing and incorrect la
 bels and handling label correlations. We propose a random forest based sol
 ution for jointly tackling these issues. We develop a novel extension of r
 andom forests for multi-label classification which can learn from positive
  data alone and can scale to large data sets. We generate real valued beli
 efs indicating the state of labels and adapt our classifier to train on th
 ese belief vectors so as to compensate for missing and noisy labels. In ad
 dition\, we modify the random forest cost function to avoid overfitting in
  high dimensional feature spaces and learn short\, balanced trees. Finally
 \, we write highly efficient  training routines which let us train on prob
 lems with more than a hundred million data points\, over a million dimensi
 onal sparse feature vector and over ten million categories. Extensive expe
 riments reveal that our proposed solution is not only significantly better
  than other multi-label classification algorithms but also more than 10\\%
  better than the state-of-the-art NLP based techniques for suggesting bid 
 phrases for online search advertisers.
LOCATION:Engineering Department\, CBL Room BE-438
END:VEVENT
END:VCALENDAR