Frequency and cardinality recovery from sketched data: a novel approach bridging Bayesian and frequentist views
- đ¤ Speaker: Stefano Favaro (University of Turin) đ Website
- đ Date & Time: Friday 03 November 2023, 14:00 - 15:00
- đ Venue: MR12, Centre for Mathematical Sciences
Abstract
We study how to recover the frequency of a symbol in a large discrete data set, using only a (lossy) compressed representation, or sketch, of those data obtained via random hashing. This is a classical problem at the crossroad of computer science and information theory, with various algorithms available, such as the count-min sketch. However, these algorithms often assume that the data are fixed, leading to overly conservative and potentially inaccurate estimates when dealing with randomly sampled data. In this talk, we consider the sketched data as a random sample from an unknown distribution, and then we introduce novel estimators that improve upon existing approaches. Our method combines Bayesian nonparametric and classical (frequentist) perspectives, addressing their unique limitations to provide a principled and practical solution. Additionally, we extend our method to address the related but distinct problem of cardinality recovery, which consists of estimating the total number of distinct objects in the data set. We validate our method on synthetic and real data, comparing its performance to state-of-the-art alternatives.
Series This talk is part of the Statistics series.
Included in Lists
- All CMS events
- All Talks (aka the CURE list)
- bld31
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- CMS Events
- custom
- DPMMS info aggregator
- DPMMS lists
- DPMMS Lists
- Guy Emerson's list
- Hanchen DaDaDash
- Interested Talks
- Machine Learning
- MR12, Centre for Mathematical Sciences
- rp587
- School of Physical Sciences
- Statistical Laboratory info aggregator
- Statistics
- Statistics Group
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)



Friday 03 November 2023, 14:00-15:00