BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Where neural scaling laws come from: a model-based theory of data 
 structure - Francesco Cagnetta\, Marie Skłodowska-Curie Fellow at SISSA\,
  Trieste
DTSTART:20260210T140000Z
DTEND:20260210T150000Z
UID:TALK244285@talks.cam.ac.uk
CONTACT:Sven Krippendorf
DESCRIPTION:Neural scaling laws reveal strikingly robust power-law relatio
 nships between the performance of language models and the amount of traini
 ng data. Yet\, a principled explanation of where the scaling exponent come
 s from—in terms of measurable properties of real data\, rather than solv
 able surrogates that neglect representation learning effects—has remaine
 d elusive. In this talk\, I introduce a model-based perspective on data st
 ructure grounded in random hierarchies: analytically tractable generative 
 models designed to capture the hierarchical and compositional structure of
  natural language while retaining explicit control over important learning
 -related statistics. I will then present new work that\, building on this 
 framework\, ties the scaling exponent observed in autoregressive language 
 modelling to two fundamental\, empirically accessible statistics of text: 
 (i) how correlations between two tokens decay with their separation t\, an
 d (ii) how the conditional entropy of the next token decreases as a functi
 on of context length n. The core message is that the representation-learni
 ng mechanism we identified by studying how deep learning methods learn ran
 dom hierarchies provides the missing link from these descriptive statistic
 s to quantitative predictions\, as it yields a concrete formula for the sc
 aling exponent in terms of the joint behaviour of these curves. The result
 ing prediction matches observed scaling remarkably well for modern neural 
 architectures trained on large text corpora. This provides\, to our knowle
 dge\, the first theory of neural scaling that depends only on intrinsic pr
 operties of the data and remains predictive in the regime of contemporary 
 language modelling.
LOCATION:DAMTP\, room MR4
END:VEVENT
END:VCALENDAR