BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:An introduction to counts-of-counts data  - Simon Tavaré PhD Herb
 ert and Florence Irving Director Irving Institute for Cancer Dynamics &amp
 \; Professor\, Departments of Statistics and Biological Sciences Columbia 
 University
DTSTART:20221012T130000Z
DTEND:20221012T140000Z
UID:TALK178910@talks.cam.ac.uk
CONTACT:Samantha Noel
DESCRIPTION:Counts-of-counts data arise in many areas of biology and medic
 ine\, and have been studied by statisticians since the 1940s. One of the f
 irst examples\, discussed by R. A. Fisher and collaborators in 1943 [1]\, 
 concerns estimation of the number of unobserved species based on summary c
 ounts of the number of species observed once\, twice\, … in a sample of 
 specimens. The data are summarized by the numbers C1\, C2\, … of species
  represented once\, twice\, … in a sample of size \nN = C1 + 2 C2 + 3 C3
  + ….  containing S = C1 + C2 + … species\; the vector C = (C1\, C2\, 
 …) gives the counts-of-counts. Other examples include the frequencies of
  the distinct alleles in a human genetics sample\, the counts of distinct 
 variants of the SARS-CoV-2 S protein obtained from consensus sequencing ex
 periments\, counts of sizes of components in certain combinatorial structu
 res [2]\, and counts of the numbers of SNVs arising in one cell\, two cell
 s\, … in a cancer sequencing experiment. \n\nIn this talk I will outline
  some of the stochastic models used to model the distribution of C\, and s
 ome of the inferential issues that come from estimating the parameters of 
 these models. I will touch on the celebrated Ewens Sampling Formula [3] an
 d Fisher’s multiple sampling problem concerning the variance expected be
 tween values of S in samples taken from the same population [3]. Variants 
 of birth-death-immigration processes can be used\, for example when differ
 ent variants grow at different rates. The classical Yule process with immi
 gration can be used to derive some of the combinatorial results in a simpl
 e way\, through a probabilistic trick known as embedding.\n\nReferences\n\
 n[1] Fisher RA\, Corbet AS & Williams CB. J Animal Ecology\, 12\, 1943\n[2
 ] Arratia R\, Barbour AD & Tavaré S. Logarithmic Combinatorial Structures
 \, EMS\, 2002\n[3] Ewens WJ. Theoret Popul Biol\, 3\, 1972\n[4] Da Silva P
 \, Jamshidpey A\, McCullagh P & Tavaré S. Bernoulli\, in press\, 2022  
LOCATION:CMS\, Meeting Room 15
END:VEVENT
END:VCALENDAR
