BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Statistical Optimality of Stochastic Gradient Descent on Hard Lear
 ning Problems through Multiple Passes - Francis Bach (INRIA Paris - Rocque
 ncourt\; ENS - Paris)
DTSTART:20180628T100000Z
DTEND:20180628T104500Z
UID:TALK107485@talks.cam.ac.uk
CONTACT:INI IT
DESCRIPTION:We consider stochastic gradient descent (SGD) for least-square
 s regression with potentially several passes over the data. While several 
 passes have been widely reported to perform practically better in terms of
  predictive performance on unseen data\, the existing theoretical analysis
  of SGD suggests that a single pass is statistically optimal. While this i
 s true for low-dimensional easy problems\, we show that for hard problems\
 , multiple passes lead to statistically optimal predictions while single p
 ass does not\; we also show that in these hard models\, the optimal number
  of passes over the data increases with sample size. In order to define th
 e notion of hardness and show that our predictive performances are optimal
 \, we consider potentially infinite-dimensional models and notions typical
 ly associated to kernel methods\, namely\, the decay of eigenvalues of the
  covariance matrix of the features and the complexity of the optimal predi
 ctor as measured through the covariance matrix. We illustrate our results 
 on synthetic experiments with non-linear kernel methods and on a classical
  benchmark with a linear model. (Joint work with Loucas Pillaud-Vivien and
  Alessandro Rudi)  <br><br><br><br><br>
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR
