An introduction to counts-of-counts data
- 👤 Speaker: Simon Tavaré PhD Herbert and Florence Irving Director Irving Institute for Cancer Dynamics & Professor, Departments of Statistics and Biological Sciences Columbia University
- 📅 Date & Time: Wednesday 12 October 2022, 14:00 - 15:00
- 📍 Venue: CMS, Meeting Room 15
Abstract
Counts-of-counts data arise in many areas of biology and medicine, and have been studied by statisticians since the 1940s. One of the first examples, discussed by R. A. Fisher and collaborators in 1943 [1], concerns estimation of the number of unobserved species based on summary counts of the number of species observed once, twice, … in a sample of specimens. The data are summarized by the numbers C1, C2, … of species represented once, twice, … in a sample of size N = C1 2 C2 3 C3 …. containing S = C1 C2 + … species; the vector C = (C1, C2, …) gives the counts-of-counts. Other examples include the frequencies of the distinct alleles in a human genetics sample, the counts of distinct variants of the SARS -CoV-2 S protein obtained from consensus sequencing experiments, counts of sizes of components in certain combinatorial structures [2], and counts of the numbers of SNVs arising in one cell, two cells, … in a cancer sequencing experiment.
In this talk I will outline some of the stochastic models used to model the distribution of C, and some of the inferential issues that come from estimating the parameters of these models. I will touch on the celebrated Ewens Sampling Formula [3] and Fisher’s multiple sampling problem concerning the variance expected between values of S in samples taken from the same population [3]. Variants of birth-death-immigration processes can be used, for example when different variants grow at different rates. The classical Yule process with immigration can be used to derive some of the combinatorial results in a simple way, through a probabilistic trick known as embedding.
References
[1] Fisher RA, Corbet AS & Williams CB. J Animal Ecology, 12, 1943 [2] Arratia R, Barbour AD & Tavaré S. Logarithmic Combinatorial Structures, EMS , 2002 [3] Ewens WJ. Theoret Popul Biol, 3, 1972 [4] Da Silva P, Jamshidpey A, McCullagh P & Tavaré S. Bernoulli, in press, 2022
Series This talk is part of the Computational and Systems Biology Seminar Series 2023 - 24 series.
Included in Lists
- All CMS Events
- All Talks (aka the CURE list)
- Biology
- CamBridgeSens
- Cambridge talks
- CMS, Meeting Room 15
- Computational and Systems Biology
- Computational and Systems Biology Seminar Series 2023 - 24
- custom
- Graduate-Seminars
- Life Science Interface Seminars
- Life Sciences
- Life Sciences
- ME Seminar
- my_list
- other talks
- PMRFPS's
- School of Physical Sciences
- se393's list
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Wednesday 12 October 2022, 14:00-15:00