BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Data sketching for cardinality and entropy estimation - Dr Ioana C
 osma\, Statistical Laboratory\, University of Cambridge
DTSTART:20111109T141500Z
DTEND:20111109T150000Z
UID:TALK33968@talks.cam.ac.uk
CONTACT:Rachel Fogg
DESCRIPTION:Streaming data is ubiquitous in a wide range of areas from eng
 ineering and information technology\, finance\, and commerce\, to atmosphe
 ric physics\, and earth sciences. The online approximation of properties o
 f data streams is of great interest\, but this approximation process is hi
 ndered by the sheer size of the data and the speed at which it is generate
 d.\nData stream algorithms typically allow only one pass over the data\, a
 nd maintain sub-linear\nrepresentations of the data from which target prop
 erties can be inferred with high efficiency.\nIn this talk we consider the
  online approximation of two important characterizations of data streams: 
 cardinality and empirical Shannon entropy. We assume that the number of di
 stinct elements observed in the stream is prohibitively large\, so that th
 e vector of cumulative quantities cannot be stored on main computer memory
  for fast and efficient access.\nWe focus on two techniques that use pseud
 o-random variates to form low-dimensional data sketches (using hashing and
  random projections)\, and derive estimators of the cardinality and empiri
 cal entropy. We discuss various properties of our estimators such as relat
 ive asymptotic efficiency\, recursive computability\, and error and comple
 xity bounds. Finally\, we present results on simulated data and seismic me
 asurements from a volcano.\n\nReferences:\n1. Peter Clifford and Ioana A. 
 Cosma (2011) “A statistical analysis of probabilistic counting\nalgorith
 ms” (to appear in the Scandinavian Journal of Statistics\, preprint on a
 rXiv:0801.3552).\n2. Peter Clifford and Ioana A. Cosma (2009)“A simple s
 ketching algorithm for entropy\nestimation” (in preparation\, preprint o
 n arXiv:0908.3961).
LOCATION:LR4\, Engineering\, Department of
END:VEVENT
END:VCALENDAR
