BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Coresets for scalable Bayesian logistic regression - Tamara Broder
 ick (Massachusetts Institute of Technology)
DTSTART:20170704T123000Z
DTEND:20170704T131500Z
UID:TALK73142@talks.cam.ac.uk
CONTACT:INI IT
DESCRIPTION:<span>Co-authors: Jonathan H. Huggins		(MIT)\, Trevor Campbell
 		(MIT)        <br></span><span><br>The use of Bayesian methods in large-s
 cale data settings is attractive because of the rich hierarchical models\,
  uncertainty quantification\, and prior specification they provide. Howeve
 r\, standard Bayesian inference algorithms are computationally expensive\,
  so their direct application to large datasets can be difficult or infeasi
 ble. Rather than modify existing algorithms\, we instead leverage the insi
 ght that data is often redundant via a pre-processing step. In particular\
 , we construct a weighted subset of the data (called a coreset) that is mu
 ch smaller than the original dataset. We then input this small coreset to 
 existing posterior inference algorithms without modification. To demonstra
 te the feasibility of this approach\, we develop an efficient coreset cons
 truction algorithm for Bayesian logistic regression models. We provide the
 oretical guarantees on the size and approximation quality of the coreset -
 - both for fixed\, known datasets\, and in expectation for a wide class o 
 f data generative models. Our approach permits efficient construction of t
 he coreset in both streaming and parallel settings\, with minimal addition
 al effort. We demonstrate the efficacy of our approach on a number of synt
 hetic and real-world datasets\, and find that\, in practice\, the size of 
 the coreset is independent of the original dataset size.</span>
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR
