Learning Common Grammar from Multilingual Corpus / Online Multiscale Dynamic Topic Models
- π€ Speaker: Dr. Tomoharu Iwata (NTT)
- π Date & Time: Wednesday 21 July 2010, 11:00 - 12:00
- π Venue: Engineering Department, CBL Room 438
Abstract
I will give the following two talks.
“Learning Common Grammar from Multilingual Corpus”
We propose a corpus-based probabilistic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose, we assume a generative model for multilingual corpora, where each sentence is generated from a language dependent probabilistic context-free grammar (PCFG), and these PCF Gs are generated from a prior grammar that is common across languages. We also develop a variational method for efficient inference. Experiments on a non-parallel multilingual corpus of eleven languages demonstrate the feasibility of the proposed method.
“Online Multiscale Dynamic Topic Models”
We propose an online topic model for sequentially analyzing the time evolution of topics in document collections. Topics naturally evolve with multiple timescales. For example, some words may be used consistently over one hundred years, while other words emerge and disappear over periods of a few days. Thus, in the proposed model, current topic-specific distributions over words are assumed to be generated based on the multiscale word distributions of the previous epoch. Considering both the long-timescale dependency as well as the short-timescale dependency yields a more robust model. We derive efficient online inference procedures based on a stochastic EM algorithm, in which the model is sequentially updated using newly obtained data.
Series This talk is part of the Machine Learning @ CUED series.
Included in Lists
- All Talks (aka the CURE list)
- Biology
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge Neuroscience Seminars
- Cambridge talks
- CBL important
- Chris Davis' list
- Creating transparent intact animal organs for high-resolution 3D deep-tissue imaging
- dh539
- dh539
- Engineering Department, CBL Room 438
- Featured lists
- Guy Emerson's list
- Hanchen DaDaDash
- Inference Group Summary
- Information Engineering Division seminar list
- Interested Talks
- Joint Machine Learning Seminars
- Life Science
- Life Sciences
- Machine Learning @ CUED
- Machine Learning Summary
- ML
- ndk22's list
- Neuroscience
- Neuroscience Seminars
- Neuroscience Seminars
- ob366-ai4er
- Required lists for MLG
- rp587
- Seminar
- Simon Baker's List
- Stem Cells & Regenerative Medicine
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Wednesday 21 July 2010, 11:00-12:00