BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Scalable Non-Markovian Language Modelling - Ehsan Shareghi
DTSTART:20180503T100000Z
DTEND:20180503T110000Z
UID:TALK105205@talks.cam.ac.uk
CONTACT:Dimitri Kartsaklis
DESCRIPTION:Markov models are popular means of modeling the underlying str
 ucture of natural language\, which is naturally represented as sequences a
 nd trees. The locality assumption made in low-order Markov models such as 
 n-gram language models is limiting\, because if the data generation proces
 s exhibits long range dependencies\, modeling the distribution well requir
 es consideration of long range context. On the other hand\, higher-order M
 arkov\, or infinite-order Non-Markovian (infinite-order Markov) models\, e
 xhibit computational complexity and statistical challenges during learning
  and inference. In particular\, under the large data setting their exponen
 tial number of parameters often results in estimation and sampler mixing i
 ssues\, while representing the structure of the model\, and sufficient sta
 tistics or sampler states can quickly become computationally inefficient a
 nd impractical. \n\nWe propose a framework based on compressed data struct
 ures which keeps the memory usage of modeling\, learning\, and inference s
 teps independent from the order of the models. Our approach scales nicely 
 with the order of the Markov model and data size\, and is highly competiti
 ve with the state-of-the-art in terms of the memory and runtime\, while al
 lowing us to develop Bayesian and non-Bayesian smoothing techniques. Using
  our compressed framework to represent the models\, we explore its scalabi
 lity under two Non-Markovian language modeling settings\, using large scal
 e data and infinite context. \n\nFirst\, we model the Kneser-Ney family of
  language models and illustrate that our approach is several orders of mag
 nitude more memory efficient than the state-of-the-art\, in training and t
 esting\, while it is highly competitive in terms of run-times of both phas
 es. When memory is a limiting factor at query time\, our approach is order
 s of magnitude faster than the state-of-the-art.  We then turn to Hierarch
 ical Nonparametric Bayesian language modeling\, and develop efficient samp
 ling mechanism which allows us to prevent the sampler mixing issue\, commo
 n in large Bayesian models. More precisely\, compared with the previous st
 at-of-the-art hierarchical Bayesian language model\, the experimental  res
 ults  illustrate  that  our  model  can be  built  on  100x  larger  datas
 ets\, while being several orders  of  magnitude  smaller\, fast  for  trai
 ning  and inference\, and outperforming the perplexity of the state-of-the
 -art  Modified  Kneser-Ney LM by up to 15%.
LOCATION:Boardroom\, Faculty of English\, West Road
END:VEVENT
END:VCALENDAR
