BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Representation Learning for Text Retrieval: Learning and Pretraini
 ng Strategies for Dense Retrieval - Chenyan Xiong (Microsoft Research)
DTSTART:20210311T160000Z
DTEND:20210311T170000Z
UID:TALK157564@talks.cam.ac.uk
CONTACT:James Thorne
DESCRIPTION:Join Zoom Meeting\nhttps://cl-cam-ac-uk.zoom.us/j/95119479973?
 pwd=RGFYZndIVVhDWEtySy8wV3VTZlpnZz09\n\nMeeting ID: 951 1947 9973\nPasscod
 e: 602575\n\nText retrieval is one of the most predominate tasks for langu
 age techniques. It is an end application itself\, powering search engines 
 for billions of users. It can also serve as a first stage retrieval compon
 ent for other language systems: Question Answering\, Information extractio
 n\, etc. Text retrieval has been done by matching queries and documents in
  the sparse\, bag-of-words space\, e.g.\, using BM25\, since the 1970s. We
  joked that every year we saw techniques that improved BM25 by 10%\, but d
 ecades later we are still working on 10% improvement over BM25 in our rese
 arch.\nDense retrieval provides a unique opportunity to overcome the limit
 ations of bag-of-word based sparse retrieval. With pretrained language mod
 els\, we now can encode the query and documents into one embedding space a
 nd conduct reasonable first stage retrieval purely using embedding similar
 ities. In this talk\, I will first recap recent progress in dense retrieva
 l\, then I will present our incoming ICLR 2021 paper (ANCE) on better trai
 ning dense retrieval with approximate nearest neighbor contrastive learnin
 g. The obstacles in dense retrieval training led to us questioning the ali
 gnment of pretrained language models and the needs of dense retrieval. In 
 the last part of this talk I will present our on-going work (Seed-Encoder)
  in designing pretraining strategies dedicated to dense retrieval.
LOCATION:Virtual (Zoom)
END:VEVENT
END:VCALENDAR
