BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:A Probabilistic Framework for Modeling Cross-Lingual Semantic Simi
 larity (out of and in Context) Based on Latent Cross-Lingual Concepts - Iv
 an Vulic\, KU Leuven
DTSTART:20150206T120000Z
DTEND:20150206T130000Z
UID:TALK55334@talks.cam.ac.uk
CONTACT:Tamara Polajnar
DESCRIPTION:Following the ongoing growth of the World Wide Web and its omn
 ipresence in today's increasingly connected world\, users tend to abandon 
 English as the lingua franca of the global network\, since more and more c
 ontent becomes available in their native languages. In addition\, given th
 e rapid development of online encyclopedias such as Wikipedia\, blogospher
 e\, and online news portals\, users have simultaneously generated a huge v
 olume of multilingual text resources. There is a pressing need to provide 
 tools that are able to induce knowledge from the user-generated multilingu
 al text resources and effectively accomplish cross-lingual text processing
  automatically or with minimum human intervention.\n\nIn this talk we addr
 ess cross-lingual semantic similarity\, the task of detecting which words 
 (or more generally\, text units) utter similar semantic concepts and conve
 y similar meanings across languages. Models of cross-lingual similarity ar
 e typically used to automatically induce bilingual lexicons and have found
  numerous applications in information retrieval (IR)\, statistical machine
  translation (SMT) and other natural language processing (NLP) tasks.\n\nR
 esearch into corpus-based cross-lingual models of distributional similarit
 y has focused on building context-insensitive models of cross-lingual simi
 larity that typically rely on external resources such as readily available
  bilingual lexicons or parallel data to bridge the lexical chasm between t
 wo languages. In this talk we follow a completely new research path and pr
 esent a new probabilistic approach to modeling cross-lingual semantic simi
 larity (out of and in context) that is fully data-driven as it does not re
 ly on any other resources besides a (non-parallel) multilingual corpus. Th
 e framework relies on an idea of projecting words and sets of words into a
  shared latent semantic space spanned by language-pair independent latent 
 cross-lingual semantic concepts (e.g.\, cross-lingual topics obtained by a
  multilingual topic model). These latent concepts are induced from a compa
 rable corpus without any additional lexical resources. Word meaning is rep
 resented as a probability distribution over the latent cross-lingual conce
 pts\, and a change in meaning is represented as a change in the distributi
 on over these latent concepts. The first part of this talk provides a cras
 h course on multilingual text mining models with an emphasis on the multil
 ingual topic modeling approach. These models are utilized to induce the la
 tent cross-lingual concepts from multilingual data. In the second part of 
 the talk\, we present a systematic overview of the context-insensitive mod
 els of cross-lingual similarity that are built upon the paradigm of latent
  cross-lingual concepts. We compare these models in the task of bilingual 
 lexicon extraction (BLE). The final part of this talk presents an extensio
 n of the probabilistic framework towards context-aware models of cross-lin
 gual similarity. We describe new models of similarity that modulate the is
 olated out-of-context word representations with contextual knowledge and r
 eport our findings on the task of word translation in context.
LOCATION:FW26\, Computer Laboratory
END:VEVENT
END:VCALENDAR
