Representation Learning for Text Retrieval: Learning and Pretraining Strategies for Dense Retrieval
- π€ Speaker: Chenyan Xiong (Microsoft Research)
- π Date & Time: Thursday 11 March 2021, 16:00 - 17:00
- π Venue: Virtual (Zoom)
Abstract
Join Zoom Meeting https://cl-cam-ac-uk.zoom.us/j/95119479973?pwd=RGFYZndIVVhDWEtySy8wV3VTZlpnZz09
Meeting ID: 951 1947 9973 Passcode: 602575
Text retrieval is one of the most predominate tasks for language techniques. It is an end application itself, powering search engines for billions of users. It can also serve as a first stage retrieval component for other language systems: Question Answering, Information extraction, etc. Text retrieval has been done by matching queries and documents in the sparse, bag-of-words space, e.g., using BM25 , since the 1970s. We joked that every year we saw techniques that improved BM25 by 10%, but decades later we are still working on 10% improvement over BM25 in our research. Dense retrieval provides a unique opportunity to overcome the limitations of bag-of-word based sparse retrieval. With pretrained language models, we now can encode the query and documents into one embedding space and conduct reasonable first stage retrieval purely using embedding similarities. In this talk, I will first recap recent progress in dense retrieval, then I will present our incoming ICLR 2021 paper (ANCE) on better training dense retrieval with approximate nearest neighbor contrastive learning. The obstacles in dense retrieval training led to us questioning the alignment of pretrained language models and the needs of dense retrieval. In the last part of this talk I will present our on-going work (Seed-Encoder) in designing pretraining strategies dedicated to dense retrieval.
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Computer Education Research
- Computing Education Research
- Department of Computer Science and Technology talks and seminars
- Graduate-Seminars
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- ndk22's list
- NLIP Seminar Series
- ob366-ai4er
- PMRFPS's
- rp587
- School of Technology
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- Virtual (Zoom)
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Thursday 11 March 2021, 16:00-17:00