BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Modeling Science: Topic models of Scientific Journals and Other La
 rge Document Collections - David Blei\, Computer Science\, Princeton Unive
 rsity
DTSTART:20080123T140000Z
DTEND:20080123T150000Z
UID:TALK9623@talks.cam.ac.uk
CONTACT:Zoubin Ghahramani
DESCRIPTION:A surge of recent research in machine learning and statistics 
 has\ndeveloped new techniques for finding patterns of words in document\nc
 ollections using hierarchical probabilistic models.  These models are\ncal
 led "topic models" because the word patterns often reflect the\nunderlying
  topics that permeate the documents\; however topic models\nalso naturally
  apply to data such as images and biological sequences.\n\nAfter reviewing
  the basics of topic modeling\, I will describe two\nrelated lines of rese
 arch in this field\, which extend the current\nstate of the art.\n\nFirst\
 , while previous topic models have assumed that the corpus is\nstatic\, ma
 ny document collections actually change over time:\nscientific articles\, 
 emails\, and search queries reflect evolving\ncontent\, and it is importan
 t to model the corresponding evolution of\nthe underlying topics.  For exa
 mple\, an article about biology in 1885\nwill exhibit significantly differ
 ent word frequencies than one in\n2005.  I will describe probabilistic mod
 els designed to capture the\ndynamics of topics as they evolve over time.\
 n\nSecond\, previous models have assumed that the occurrence of the\ndiffe
 rent latent topics are independent.  In many document\ncollections\, the p
 resence of a topic may be correlated with the\npresence of another.  For e
 xample\, a document about sports is more\nlikely to also be about health t
 han international finance.  I will\ndescribe a probabilistic topic model w
 hich can capture such\ncorrelations between the hidden topics.\n\nIn addit
 ion to giving quantitative\, predictive models of a corpus\,\ntopic models
  provide a qualitative window into the structure of a\nlarge document coll
 ection.  This perspective allows a user to explore\na corpus in a topic-gu
 ided fashion.  We demonstrate the capabilities\nof these new models on the
  archives of the journal Science\, founded in\n1880 by Thomas Edison.  Our
  models are built on the noisy text from\nJSTOR\, an online scholarly jour
 nal archive\, resulting from an optical\ncharacter recognition engine run 
 over the original bound journals.\n\n(joint work with J. Lafferty)\n
LOCATION:Engineering Department - LR3
END:VEVENT
END:VCALENDAR
