BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Learning Privately in High Dimensions - Prof. Marco Mondelli\, Ins
 titute of Science and Technology Austria
DTSTART:20250528T130000Z
DTEND:20250528T140000Z
UID:TALK224392@talks.cam.ac.uk
CONTACT:Prof. Ramji Venkataramanan
DESCRIPTION:Deep learning models memorize training samples and\, as such\,
  they are vulnerable to various attacks directed to retrieve information a
 bout the training dataset. The goal of the talk is to quantify this phenom
 enon\, as well as the corresponding defenses in terms of differentially pr
 ivate algorithms\, through the lens of high-dimensional regression. \n\nTh
 e first part of the talk considers empirical risk minimization\, focusing 
 on the memorization of spurious features that are uncorrelated with the le
 arning task. We relate such memorization to two separate terms: (i) the st
 ability of the model with respect to individual training samples\, and (ii
 ) the feature alignment between the spurious feature and the full sample. 
 This shows that memorization weakens as the generalization capability incr
 eases and\, through the precise analysis of the feature alignment\, we des
 cribe the role of the model and of its activation function. We then discus
 s spurious correlations between non-predictive features and the associated
  labels in the training data. We provide a statistical characterization of
  how such correlations are learnt in high-dimensional regression\, unveili
 ng the role of the data covariance\, the regularization strength and the o
 ver-parameterization. \n\nThe second part of the talk considers differenti
 ally private gradient descent\, a popular algorithm with provable guarante
 es on the privacy of the training data. While understanding its performanc
 e cost with respect to standard gradient descent has received remarkable a
 ttention\, existing bounds on the excess population risk degrade with over
 -parameterization. This leaves practitioners without clear guidance\, lead
 ing some to reduce the effective number of trainable parameters to improve
  performance\, while others use larger models to achieve better results th
 rough scale. We show that\, for any sufficiently over-parameterized random
  features model\, privacy can be obtained for free\, i.e.\, the excess pop
 ulation risk is negligible not only when the privacy parameter \\epsilon h
 as constant order\, but also in the strongly private setting \\epsilon = o
 (1). This challenges the common wisdom that over-parameterization inherent
 ly hinders performance in private learning.\n\n*Bio*: Marco Mondelli recei
 ved the B.S. and M.S. degree in Telecommunications Engineering from the Un
 iversity of Pisa\, Italy\, in 2010 and 2012\, respectively. In 2016\, he o
 btained his Ph.D. degree in Computer and Communication Sciences at EPFL. I
 n 2017-2019\, he was a Postdoctoral Scholar in the Department of Electrica
 l Engineering at Stanford University. In 2018\, he was also a Research Fel
 low with the Simons Institute for the Theory of Computing\, for the progra
 m on “Foundations of Data Science”. He has been a faculty member at th
 e Institute of Science and Technology Austria (ISTA) since 2019\, first as
  an Assistant Professor and\, since 2025\, as a Professor. His research in
 terests include data science\, machine learning\, high-dimensional statist
 ics\, information theory\, and coding theory. He is the recipient of a num
 ber of fellowships and awards\, including the Jack K. Wolf ISIT Student Pa
 per Award in 2015\, the STOC Best Paper Award in 2016\, the EPFL Doctorate
  Award in 2018\, the Simons-Berkeley Research Fellowship in 2018\, the Lop
 ez-Loreta Prize in 2019\, the Information Theory Society Best Paper Award 
 in 2021 and the ERC Starting Grant in 2024.\n
LOCATION:MR5\, CMS Pavilion A
END:VEVENT
END:VCALENDAR
