BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Herding or a '3rd way to learn' - Simon Lacoste-Julien and Ferenc 
 Huszar
DTSTART:20101021T130000Z
DTEND:20101021T143000Z
UID:TALK26203@talks.cam.ac.uk
CONTACT:Shakir Mohamed
DESCRIPTION:In this RCC\, we will cover a new learning approach quite rece
 ntly proposed (2009) by Max Welling and peculiarly called "Herding". Even 
 though this is brand new research\, it contains interesting links with\nst
 andard machine learning concepts that we will review. Also\, we will split
  the RCC in two parts which will show the interesting evolution typical in
  research from the original idea (with the first paper) to a more modern v
 iewpoint (with its latest paper).\n\n"Herding" is a novel approach to lear
 ning which seems a bit peculiar at first sight but has interesting propert
 ies and actually good empirical performance. It can be seen as a '3rd way 
 to learn': the first one being traditional frequentist point estimates of 
 parameters and the second using Bayesian posterior over parameters. Herdin
 g lies somewhat in between: rather than maintaining samples over parameter
 s as in Bayesian learning\, it navigates the parameter space in a determin
 istic (but almost chaotic) way which don't converge to any particular poin
 t estimate. During this exploration\, it produces pseudosamples which can 
 be used in a similar fashion one would use samples from a MCMC method *aft
 er* learning. The two main advantages of herding are that:\n# it is comput
 ationally cheaper than traditional learning of Markov Random Field by lear
 ning & doing inference in one deterministic step\; \n# the resulting pseud
 o-samples exhibit strong anticorrelations\, thereby providing a more unifo
 rm coverage of the space -- this means that the number of (pseudo)samples 
 needed to approximate relevant integrals is substantially smaller than in 
 the case of random sampling (namely\, has O(1/T) convergence vs. O(1/sqrt(
 T)) for iid sampling).\n\nIn the first half of the RCC\, Simon will cover 
 the original herding paper:\n* _Herding Dynamical Weights to Learn_\, Max 
 Welling\, ICML 2009 - "pdf":http://www.cs.mcgill.ca/~icml2009/papers/447.p
 df <br>\nwhich introduced herding as an algorithm for learning/sampling in
  the context of Markov random fields (and exhibits property 1. mentioned\n
 above). The first part therefore concentrates on links with previous appro
 aches to learning under Markov random fields. <br>\n[Note that if you are 
 interested enough in the topic to read very carefully this paper\, there i
 s a correction in the proof of recurrence "here":http://www.ics.uci.edu/~w
 elling/publications/papers/Correction%20to%20Recurrence%20Proof%20for%20He
 rding.pdf .]\n\nIn the second half of the RCC\, Ferenc will concentrate on
  the most recent paper:\n* _Super-Samples from Kernel Herding_\, Yutian Ch
 en\, Max Welling\, Alex\nSmola\, UAI 2010 - "pdf":event.cwi.nl/uai2010/pap
 ers/UAI2010_0238.pdf  <br>\nin which kernel herding is introduced and pres
 ent a more modern viewpoint on herding in the context of approximating dis
 tributions (and where property 2. mentioned above is exploited). Herding h
 ere can be understood as a greedy optimisation for approximating a probabi
 lity distribution with the empirical distribution of pseudosamples in such
  a way that certain nonlinear moments of the original distribution are pre
 served. Connections to message passing\, Monte Carlo\, and other relevant 
 methods will be discussed.
LOCATION:Engineering Department\, CBL Room 438
END:VEVENT
END:VCALENDAR
