BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Statistical Investigations into the Unseen: Missing Mass for Marko
 v Samples and Natural Distribution Estimation - Prof. Andrew Thangaraj\, I
 ndian Institute of Technology Madras
DTSTART:20251110T140000Z
DTEND:20251110T150000Z
UID:TALK240091@talks.cam.ac.uk
CONTACT:Prof. Ramji Venkataramanan
DESCRIPTION:Suppose we observe a sequence of samples from a very large alp
 habet and the number of samples is comparable or lesser than the alphabet 
 size. Several letters from the alphabet will be unseen or missing in the o
 bserved samples. What can be inferred about the distribution's probability
  mass on the missing letters? The sum of the probability\nmasses on all mi
 ssing letters is called missing mass\, and the classical Good-Turing (GT) 
 estimator is minimax optimal over all distributions and alphabet sizes whe
 n the samples are iid. However\, when the samples are Markovian sequences\
 , the GT estimator fails. In this talk\, we will introduce a windowed vers
 ion of the GT estimator\nand show that\, when the window size is sufficien
 tly larger than the mixing time\, the windowed GT estimator is nearly mini
 max optimal. Going beyond missing mass\, we will present the generalizatio
 n to higher-order missing mass and missing g-mass\, which can potentially 
 quantify the distance of the missing part of the distribution from\nunifor
 mity. We will conclude with some extensions of these results to the distri
 bution's probability mass on sparsely observed letters and potential impac
 t on distribution estimation.
LOCATION: Cambridge University Engineering Department\, JDB Seminar Room
END:VEVENT
END:VCALENDAR
