University of Cambridge > Talks.cam > Computer Vision Seminars > Toward Generalizable and Intelligent Visual Reasoning Models

Log in

Google

Microsoft

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Toward Generalizable and Intelligent Visual Reasoning Models

Download to your calendar using vCal

If you have a question about this talk, please contact Elliott Wu .

Visual reasoning models have made remarkable progress in recent years, yet they are still not widely deployed in critical real-world settings—where data is scarce, tasks are multi-step, and outputs must be inspectable and verifiable. To address this gap, I propose building multimodal reasoning models with structural priors that can robustly perceive, interpret, and interact with the physical world under human specified-instructions. In this talk, I will cover a spectrum of modeling paradigms and environments: (1) neuro-symbolic models, where hybrid explicit-implicit representations provide efficiency and generalization by design in structured settings; (2) foundation model-distilled frameworks, which externalize prior knowledge to structure vision-language models’ reasoning process in open-ended domains; (3) structure-induction frameworks, which use interpretable representational bottlenecks to uncover patterns in complex, unlabeled visual data. I will conclude by outlining a path toward visual-language models that can generalize across diverse sensing modalities and conduct intelligent decision-making in the real world.

Bio: Joy Hsu is a PhD candidate in Computer Science at Stanford University, advised by Prof. Jiajun Wu. Her research focuses on making visual reasoning models reliable in real-world settings under sensing, data, and compute constraints. She develops multimodal reasoning models with structural priors that enable systems to perceive, interpret, and interact intelligently with the physical world across diverse, data-scarce domains. She is a recipient of the Knight-Hennessy Fellowship and the NSF Fellowship, and was awarded third place in the Amazon Robotics PhD competition and named a Rising Star in AI in 2025.

Zoom: https://cam-ac-uk.zoom.us/j/85290977324?pwd=C3KItZS8d2XaVyUsb88HKuV5wWFrYV.1

This talk is part of the Computer Vision Seminars series.

This talk is included in these lists:

LT6, Department of Engineering

Note that ex-directory lists are not shown.

Toward Generalizable and Intelligent Visual Reasoning Models

📅 Download to calendar (vCal)

👤 Speaker: Joy Hsu, Stanford University 🔗 Website
📅 Date & Time: Monday 23 March 2026, 14:00 - 15:00
📍 Venue: LT6, Department of Engineering

Questions? Contact Elliott Wu

Abstract

Zoom: https://cam-ac-uk.zoom.us/j/85290977324?pwd=C3KItZS8d2XaVyUsb88HKuV5wWFrYV.1

Series This talk is part of the Computer Vision Seminars series.

Included in Lists

LT6, Department of Engineering

Note: Ex-directory lists are not shown.

Log in

🔐 Log In

Information on

ℹ️ Information

Toward Generalizable and Intelligent Visual Reasoning Models

This talk is included in these lists:

Toward Generalizable and Intelligent Visual Reasoning Models

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

Toward Generalizable and Intelligent Visual Reasoning Models

This talk is included in these lists:

Other lists

Other talks

Toward Generalizable and Intelligent Visual Reasoning Models

Abstract

Included in Lists