BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Bayesian Semi-supervised Multicategory Classification under Nonpar
 anormality - Subhashis Ghoshal (North Carolina State University)
DTSTART:20250730T130000Z
DTEND:20250730T140000Z
UID:TALK234787@talks.cam.ac.uk
DESCRIPTION:Semi-supervised learning is a machine learning technique that 
 combines supervised and unsupervised learning by utilizing both labeled an
 d unlabeled data to train statistical models for classification and regres
 sion tasks. This paper addresses the problem of semi-supervised binary cla
 ssification\, assuming that the underlying data has only a few observation
 s labeled in each class. Some methods have been developed for semi-supervi
 sed classification outside the Bayesian domain\, but most works in the Bay
 esian domain utilize Gaussian mixture models. However\, the assumption tha
 t the subpopulations are Gaussian may not be realistic in some situations.
  We generalize the data-generating process to the nonparanormality setting
 : the observations result from an unknown component-wise monotone increasi
 ng transformation applied to a hidden layer of multivariate normal latent 
 variables. We assign a prior distribution to the transformation functions 
 using B-splines\, which naturally maintain monotonicity and satisfy the re
 quired identifiability constraints. We use a Gibbs sampler to coordinate d
 raws from the posterior distribution of four objects: the missing labels\,
  the coefficients of the B-spline expansions of the transformation functio
 ns\, the parameters of the multivariate normal distributions of the compon
 ent populations\, and the population mixing proportions. The posterior dra
 ws of these objects use the Bayes formula for categories\, Hamiltonian Mon
 te Carlo\, normal-normal conjugacy\, and beta-binomial conjugacy\, respect
 ively. Using a low-density at separation assumption\, we tune the number o
 f terms in the B-spline expansions. We evaluate the performance of the pro
 posed method based on extensive simulated data. We conclude that the propo
 sed method yields low classification error rates\, even when the nonparano
 rmality assumption is violated\, and outperforms many state-of-the-art sem
 i-supervised machine learning techniques. The method performs well on seve
 ral benchmark binary classification datasets.
LOCATION:Seminar Room 2\, Newton Institute
END:VEVENT
END:VCALENDAR
