BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:From Sparse Modeling to Sparse Communication - André F. T. Martin
 s
DTSTART:20201126T110000Z
DTEND:20201126T120000Z
UID:TALK154402@talks.cam.ac.uk
CONTACT:Marinela Parovic
DESCRIPTION:Sparse modeling is an important\, decades-old area in machine 
 learning which aims to select and discover the relevant features that shou
 ld be included in a model. In this talk I will describe how this toolbox c
 an be extended and adapted for facilitating sparse communication in neural
  networks. The building block is a family of sparse transformations called
  alpha-entmax\, a drop-in replacement for softmax. Entmax transformations 
 are differentiable and (unlike softmax) they can return sparse probability
  distributions\, useful to select relevant input features.\n\nIn the first
  part\, I will illustrate the use of alpha-entmax in attention mechanisms.
  These sparse transformations and their structured and continuous variants
  have been applied with success to machine translation\, natural language 
 inference\, visual question answering\, and other tasks. I will show how l
 earning the alpha parameter can lead to "adaptively sparse transformers\,"
  where each attention head learns to choose between focused or spread-out 
 behavior. I will proceed to  describe a framework for model prediction exp
 lainability as a sparse communication problem between an explainer and a l
 ayperson\, which takes advantage of the selection capabilities of sparse a
 ttention. If time permits\, I will show how this framework can be extended
  to continuous domains to obtain sparse densities\, illustrating with an a
 pplication in visual question answering where "continuous attention" selec
 ts elliptical regions in the image.\n\nIn the second part\, I will show ho
 w sparse transformations can also be used as a replacement for the cross-e
 ntropy loss\, via the family of entmax losses. This leads to sparse sequen
 ce-to-sequence models\, where beam search can be exact\, and to language m
 odels that are natively sparse\, eliminating the need for top-k and nucleu
 s sampling. I will show applications in morphological tasks\, machine tran
 slation\, and text generation.\n\nThis work was funded by the DeepSPIN ERC
  project (https://deep-spin.github.io).
LOCATION:https://teams.microsoft.com/l/meetup-join/19%3ameeting_YmQyY2ViND
 gtZDE1MC00MzZhLWFjZGItOWFmMjM2OTI1ZDQy%40thread.v2/0?context=%7b%22Tid%22%
 3a%2249a50445-bdfa-4b79-ade3-547b4f3986e9%22%2c%22Oid%22%3a%2230bfe2fc-889
 6-487c-84f2-f4b8875a60b2%22%7d
END:VEVENT
END:VCALENDAR
