Towards 4D Foundational Models for Dynamic World Reconstruction
- đ¤ Speaker: Zeren Jiang, University of Oxford đ Website
- đ Date & Time: Monday 16 March 2026, 15:00 - 16:00
- đ Venue: Lecture Room 3B, Department of Engineering (Trumpington Street)
Abstract
Abstract: In this talk, I will present a line of work on diffusion-based models for 4D reconstruction and tracking of dynamic scenes. I begin with Geo4D and Track4D, where we explore how video diffusion priors can be leveraged to recover the dynamic 4D structure of the world from video observations. These models demonstrate how generative priors learned from large-scale video data can significantly improve the reconstruction and tracking of complex dynamic scenes. Next, I focus on Mesh4D, an object-centric reconstruction framework that models dynamic objects as temporally coherent meshes. Leveraging diffusion models, Mesh4D can plausibly infer and hallucinate unseen regions of objects during motion. To further stabilize training and improve reconstruction quality, we incorporate skeleton priors as privileged knowledge within the diffusion reconstruction pipeline. Finally, I will introduce Syn4D, a large-scale synthetic dataset designed to support a wide range of 4D vision tasks. Syn4D enables research on geometry-aware novel view synthesis, 4D reconstruction and tracking, and human pose estimation, providing a scalable platform for training and evaluating future 4D models. Together, these works represent steps toward 4D foundational models capable of reconstructing the dynamic physical world.
Bio: Zeren Jiang is a DPhil student in the Visual Geometry Group (VGG) at the University of Oxford. He received a dual degree in Software Engineering and Mathematics & Applied Mathematics from Beihang University in Beijing, and later earned an MSc in Computer Science with distinction from ETH Zurich. His research lies at the intersection of computer vision and computer graphics, with the goal of building systems that can perceive and understand the dynamic physical world in real time, and ultimately learn generative models capable of creating immersive and physically plausible virtual environments. Zeren has published five peer-reviewed papers as first or co-first author at top-tier venues. His work has received the Best Paper Award at ACM Multimedia 2021 and the Best Video Award at IJCAI 2021 .
Zoom link: https://cam-ac-uk.zoom.us/j/84730633222?pwd=C4HZnh8F5ONlVJEa77asYXZ6WCNYD6.1
Series This talk is part of the Computer Vision Seminars series.
Included in Lists
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Zeren Jiang, University of Oxford 
Monday 16 March 2026, 15:00-16:00