From Sparse Modeling to Sparse Communication
- π€ Speaker: AndrΓ© F. T. Martins
- π Date & Time: Thursday 26 November 2020, 11:00 - 12:00
- π Venue: https://teams.microsoft.com/l/meetup-join/19%3ameeting_YmQyY2ViNDgtZDE1MC00MzZhLWFjZGItOWFmMjM2OTI1ZDQy%40thread.v2/0?context=%7b%22Tid%22%3a%2249a50445-bdfa-4b79-ade3-547b4f3986e9%22%2c%22Oid%22%3a%2230bfe2fc-8896-487c-84f2-f4b8875a60b2%22%7d
Abstract
Sparse modeling is an important, decades-old area in machine learning which aims to select and discover the relevant features that should be included in a model. In this talk I will describe how this toolbox can be extended and adapted for facilitating sparse communication in neural networks. The building block is a family of sparse transformations called alpha-entmax, a drop-in replacement for softmax. Entmax transformations are differentiable and (unlike softmax) they can return sparse probability distributions, useful to select relevant input features.
In the first part, I will illustrate the use of alpha-entmax in attention mechanisms. These sparse transformations and their structured and continuous variants have been applied with success to machine translation, natural language inference, visual question answering, and other tasks. I will show how learning the alpha parameter can lead to “adaptively sparse transformers,” where each attention head learns to choose between focused or spread-out behavior. I will proceed to describe a framework for model prediction explainability as a sparse communication problem between an explainer and a layperson, which takes advantage of the selection capabilities of sparse attention. If time permits, I will show how this framework can be extended to continuous domains to obtain sparse densities, illustrating with an application in visual question answering where “continuous attention” selects elliptical regions in the image.
In the second part, I will show how sparse transformations can also be used as a replacement for the cross-entropy loss, via the family of entmax losses. This leads to sparse sequence-to-sequence models, where beam search can be exact, and to language models that are natively sparse, eliminating the need for top-k and nucleus sampling. I will show applications in morphological tasks, machine translation, and text generation.
This work was funded by the DeepSPIN ERC project (https://deep-spin.github.io).
Series This talk is part of the Language Technology Lab Seminars series.
Included in Lists
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Guy Emerson's list
- https://teams.microsoft.com/l/meetup-join/19%3ameeting_YmQyY2ViNDgtZDE1MC00MzZhLWFjZGItOWFmMjM2OTI1ZDQy%40thread.v2/0?context=%7b%22Tid%22%3a%2249a50445-bdfa-4b79-ade3-547b4f3986e9%22%2c%22Oid%22%3a%2230bfe2fc-8896-487c-84f2-f4b8875a60b2%22%7d
- Interested Talks
- Language Sciences for Graduate Students
- Language Technology Lab Seminars
- ndk22's list
- ob366-ai4er
- rp587
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Thursday 26 November 2020, 11:00-12:00