University of Cambridge > Talks.cam > Machine learning theory > Learning shallow neural networks in high dimensions: SGD dynamics and scaling laws

Learning shallow neural networks in high dimensions: SGD dynamics and scaling laws

Download to your calendar using vCal

If you have a question about this talk, please contact Fernando Ruiz Mazo .

Abstract: We study the sample and time complexity of online stochastic gradient descent (SGD) in learning a two-layer neural network with M orthogonal neurons on isotropic Gaussian data. We focus on the challenging “extensive-width” regime M≫1 and allow for large condition number in the second-layer parameters, covering the power-law scaling a_m= m^{-β} as a special case. We characterize the SGD dynamics for the training of a student two-layer neural network and identify sharp transition times for the recovery of each signal direction. In the power-law setting, our analysis entails that while the learning of individual teacher neurons exhibits abrupt phase transitions, the juxtaposition of emergent learning curves at different timescales results in a smooth scaling law in the cumulative objective.

This talk is co-hosted by the Computer Laboratory AI Research Group and the Informed-AI Hub.

This talk is part of the Machine learning theory series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity