BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Non-asymptotic control of a kernel 2-sample test - Perrine Lacroix
  (ENS Lyon)
DTSTART:20240119T140000Z
DTEND:20240119T150000Z
UID:TALK209539@talks.cam.ac.uk
CONTACT:Dr Sergio Bacallado
DESCRIPTION:We are interested in statistical tests to evaluate the hypothe
 sis H₀: {P = Q} against its alternative H₁: {P ≠ Q}. Our data are mu
 ltivariate\, high-dimensional and exhibit strong dependencies between vari
 ables. We propose a comparison test of two distributions based on kernel m
 ethods: our data are first transformed via a well-chosen feature map and l
 ive in a reproducing kernel Hilbert space (RKHS). Our kernel test statisti
 c is the equivalent of the Hotelling's T2 comparison test for finite-dimen
 sional multivariate data\, and is equal to the mean embeddings difference 
 (MMD) renormalized by a well-chosen covariance operator.\n\nClassically\, 
 these non-parametric tests are either calibrated asymptotically\, or via t
 est aggregation techniques. Here\, we propose to calibrate the test at a g
 iven fixed sample size by obtaining non-asymptotic bounds on our test stat
 istic. For this\, a regularization is required to approximate the covarian
 ce operator via its empirical estimator. Unlike the approaches of Harchaou
 i et al. (2007) or Hagrass et al. (2023) using L_2 regularizations\, we pr
 opose spectral truncation. This method fixes the unknown number T of eigen
 functions to reconstruct the covariance operator and provides the addition
 al advantage of data visualization.\nCurrently\, at a fixed T\, the test s
 tatistic\, called the truncated kernel Fisher Discriminant Ratio (KFDA_T)\
 , provides a test whose asymptotic calibration is known (Ozier-Lafontaine 
 et al. (2023)). In this talk\, I will present how to theoretically and non
 -asymptotically bound the p-value of the test associated with the KFDA_T. 
 This bound is a first step in defining a good calibration of the hyperpara
 meter T.\n\nIn applications\, this statistical question is essential in th
 e field of genomics\, where the two groups are composed of single-cell RNA
 -seq data. The goal is to detect distinct or similar biological behaviour 
 between the groups.\n\nJoint work with Bertrand Michel (Université de Nan
 tes\, France)\, Franck Picard (ENS de Lyon\, France) and Vincent Rivoirard
  (Paris-Dauphine\, France).
LOCATION:MR12\, Centre for Mathematical Sciences
END:VEVENT
END:VCALENDAR