BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Tensor product representations for RNNs / Revisiting post-processi
 ng for word embeddings - Shuai Tang\, University of California\, San Diego
DTSTART:20191017T100000Z
DTEND:20191017T110000Z
UID:TALK132262@talks.cam.ac.uk
CONTACT:Edoardo Maria Ponti
DESCRIPTION:1/ Tensor product representations for RNNs\n\nWidely used recu
 rrent units\, including Long-short Term Memory (LSTM) and the Gated Recurr
 ent Unit (GRU)\, perform well on natural language tasks\, but their abilit
 y to learn structured representations is still questionable. Exploiting re
 duced Tensor Product Representations (TPRs)\n--- distributed representatio
 ns of symbolic structure in which vector-embedded symbols are bound to vec
 tor-embedded structural positions --- we propose the TPRU\, a simple recur
 rent unit that\, at each time step\, explicitly executes structural-role b
 inding and unbinding operations to incorporate structural information into
  learning. The gradient analysis of our proposed TPRU is conducted to supp
 ort our model design\, and its performance on multiple datasets shows the 
 effectiveness of our design choices. Furthermore\, observations on linguis
 tically grounded study demonstrate the interpretability of our TPRU.\n\n\n
 2/ Revisiting post-processing for word embeddings\n\nWord embeddings learn
 t from large corpora have been adopted in various applications in natural 
 language processing and served as the general input representations to lea
 rning systems. Recently\, a series of post-processing methods have been pr
 oposed to boost the performance of word embeddings on similarity compariso
 n and analogy retrieval tasks\, and some have been adapted to compose sent
 ence representations. The general hypothesis behind these methods is that 
 by enforcing the embedding space to be more isotropic\, the similarity bet
 ween words can be better expressed. We view these methods as an approach t
 o shrink the covariance/Gram matrix\, which is estimated by learning word 
 vectors\, towards a scaled identity matrix. By optimising an objective in 
 the semi-Riemannian manifold with Centralised Kernel Alignment (CKA)\, we 
 are able to search for the optimal shrinkage parameter\, and provide a pos
 t-processing method to smooth the spectrum of learnt word vectors which yi
 elds improved performance on downstream tasks.
LOCATION:Board room\, Faculty of English\, 9 West Rd (Sidgwick Site)
END:VEVENT
END:VCALENDAR