BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:VGGT &amp\; Compositional 3D Generation - Jianyuan Wang and Mingha
 o Chen
DTSTART:20250807T100000Z
DTEND:20250807T110000Z
UID:TALK234910@talks.cam.ac.uk
CONTACT:Elliott Wu
DESCRIPTION:Talk 1: "VGGT: Visual Geometry Grounded Transformer” by Jian
 yuan Wang\n\nAbstract: We present VGGT\, a feed-forward neural network tha
 t directly infers all key 3D attributes of a scene\, including camera para
 meters\, point maps\, depth maps\, and 3D point tracks\, from one\, a few\
 , or hundreds of its views. This approach is a step forward in 3D computer
  vision\, where models have typically been constrained to and specialized 
 for single tasks. It is also simple and efficient\, reconstructing images 
 in under one second\, and still outperforming alternatives without their p
 ost-processing utilizing visual geometry optimization techniques. The netw
 ork achieves state-of-the-art results in multiple 3D tasks\, including cam
 era parameter estimation\, multi-view depth estimation\, dense point cloud
  reconstruction\, and point tracking. We also show that using pretrained V
 GGT as a feature backbone significantly enhances downstream tasks\, such a
 s non-rigid point tracking and feed-forward novel view synthesis.\n\nBio: 
 Jianyuan Wang is a joint PhD student at Meta AI Research and the Visual Ge
 ometry Group (VGG)\, University of Oxford\, currently in his third year. H
 is research focuses on 3D understanding\, particularly the reconstruction 
 of 3D scenes from images\, from PoseDiffusion\, VGGSfM\, to VGGT. His work
  has been recognized with several honors\, including CVPR 2025 Best Paper 
 Award.\n\n============\n\nTalk 2: "Compositional 3D Generation” by Mingh
 ao Chen\n\nAbstract: Recent breakthroughs in 3D content creation have brou
 ght remarkable progress in generating high-quality shapes. However\, real-
 world applications often require objects composed of editable and reusable
  parts\, which poses new challenges for traditional generation methods. In
  this talk\, we introduce two approaches that address this problem. PartGe
 n is a method for compositional 3D generation and reconstruction that take
 s input from various sources\, including text\, images\, and 3D scans. It 
 predicts consistent part segmentations across multiple views\, completes e
 ach part in 2D\, and lifts them to 3D. This design takes full advantage of
  recent advances in text-to-image models. In contrast\, AutoPartGen is an 
 autoregressive pipeline that generates objects part by part in a latent 3D
  space. It conditions each prediction on previously generated parts and op
 tional inputs such as masks\, image and 3D shapes\, enabling precise geome
 try generation. The model can operate automatically and extend to more com
 plex tasks such as scene and city generation.\n\nBio: Minghao Chen is a th
 ird-year DPhil student in the Visual Geometry Group at the University of O
 xford\, supervised by Professor Andrea Vedaldi and Dr. Iro Laina. His rese
 arch focuses on 2D and 3D generative models as well as 3D scene understand
 ing. He has recently received the CVPR 2025 best paper award.
LOCATION: Cambridge University Engineering Department\, JDB Seminar Room
END:VEVENT
END:VCALENDAR
