Learning generalizable models on large-scale multi-modal data
- đ¤ Speaker: Yutian Chen - DeepMind đ Website
- đ Date & Time: Wednesday 18 October 2023, 14:00 - 15:00
- đ Venue: Maxwell Centre
Abstract
The abundant spectrum of multi-modal data provides a significant opportunity for augmenting the training of foundational models beyond mere text. In this talk, I will introduce two lines of work that leverage large-scale models, trained on Internet-scale multi-modal datasets, to achieve good generalization performance. The first work trains an audio-visual model on YouTube datasets of videos and enables automatic video translation and dubbing. The model is able to learn the correspondence between audio and visual features, and use this knowledge to translate videos from one language to another. The second work trains a multi-modal, multi-task, multi-embodiment generalist policy on a massive collection of simulated control tasks, vision, language, and robotics. The model is able to learn to perform a variety of tasks, including controlling a robot arm, playing a game, and translating text. Both lines of work exhibit the potential future trajectory of foundational models, highlighting the transformative power of integrating multi-modal inputs and outputs.
Series This talk is part of the Data Intensive Science Seminar Series series.
Included in Lists
- bld31
- Cambridge Astronomy Talks
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge talks
- Chris Davis' list
- Combined External Astrophysics Talks DAMTP
- Cosmology, Astrophysics and General Relativity
- Institute of Astronomy Extra Talks
- Institute of Astronomy Talk Lists
- Interested Talks
- Maxwell Centre
- ndk22's list
- ob366-ai4er
- rp587
- Titel: TBC
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Yutian Chen - DeepMind 
Wednesday 18 October 2023, 14:00-15:00