VGGT & Compositional 3D Generation
- 👤 Speaker: Jianyuan Wang and Minghao Chen
- 📅 Date & Time: Thursday 07 August 2025, 11:00 - 12:00
- 📍 Venue: Cambridge University Engineering Department, JDB Seminar Room
Abstract
Talk 1: “VGGT: Visual Geometry Grounded Transformer” by Jianyuan Wang
Abstract: We present VGGT , a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views. This approach is a step forward in 3D computer vision, where models have typically been constrained to and specialized for single tasks. It is also simple and efficient, reconstructing images in under one second, and still outperforming alternatives without their post-processing utilizing visual geometry optimization techniques. The network achieves state-of-the-art results in multiple 3D tasks, including camera parameter estimation, multi-view depth estimation, dense point cloud reconstruction, and point tracking. We also show that using pretrained VGGT as a feature backbone significantly enhances downstream tasks, such as non-rigid point tracking and feed-forward novel view synthesis.
Bio: Jianyuan Wang is a joint PhD student at Meta AI Research and the Visual Geometry Group (VGG), University of Oxford, currently in his third year. His research focuses on 3D understanding, particularly the reconstruction of 3D scenes from images, from PoseDiffusion, VGG SfM, to VGGT . His work has been recognized with several honors, including CVPR 2025 Best Paper Award.
============
Talk 2: “Compositional 3D Generation” by Minghao Chen
Abstract: Recent breakthroughs in 3D content creation have brought remarkable progress in generating high-quality shapes. However, real-world applications often require objects composed of editable and reusable parts, which poses new challenges for traditional generation methods. In this talk, we introduce two approaches that address this problem. PartGen is a method for compositional 3D generation and reconstruction that takes input from various sources, including text, images, and 3D scans. It predicts consistent part segmentations across multiple views, completes each part in 2D, and lifts them to 3D. This design takes full advantage of recent advances in text-to-image models. In contrast, AutoPartGen is an autoregressive pipeline that generates objects part by part in a latent 3D space. It conditions each prediction on previously generated parts and optional inputs such as masks, image and 3D shapes, enabling precise geometry generation. The model can operate automatically and extend to more complex tasks such as scene and city generation.
Bio: Minghao Chen is a third-year DPhil student in the Visual Geometry Group at the University of Oxford, supervised by Professor Andrea Vedaldi and Dr. Iro Laina. His research focuses on 2D and 3D generative models as well as 3D scene understanding. He has recently received the CVPR 2025 best paper award.
Series This talk is part of the Computer Vision Seminars series.
Included in Lists
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Jianyuan Wang and Minghao Chen
Thursday 07 August 2025, 11:00-12:00