Surgical data using LLMs
- đ¤ Speaker: Hugo Georgenthum
- đ Date & Time: Friday 21 March 2025, 17:00 - 17:45
- đ Venue: Lecture Theatre 2, Computer Laboratory, William Gates Building
Abstract
The automatic summarization of surgical videos is crucial for improving procedural documentation, surgical training, and post-operative analysis. This thesis presents a new method at the intersection of artificial intelligence and medicine, seeking to develop innovative machine-learning models with real-world applications in surgery. To this end, we propose a multi-modal approach to generate video summaries by benefiting from the latest improvements in both computer vision and large language models. For instance, the model processes surgical videos in the 3 following key steps. After dividing the video into clips, the focus is on the extraction of visual features, by treating the clips on a frame level with visual transformers. The goal is to detect the tools, organs, tissues and actions performed by the surgeon. These visual features are then translated to frame captions using large language models. Subsequently, on the video level, the emphasis is placed on the temporal features. The latter are obtained with a Vivit-based encoder by taking as input both the clips and the frame captions extracted earlier. In an analogous way to the frame captions, the temporal features are converted into clip captions, which capture the overall context of the clip. The last phase gathers the combination of the clip descriptions into a surgical report with an LLM specifically designed for this task. We train and evaluate our model on the CholecT50 dataset, leveraging instrument and action frame annotations along 50 laparoscopic videos. Experimental results demonstrate that our method produces coherent and contextually meaningful summaries, with a 96% precision for tool detection and 0.74% Bert score for temporal context extraction. This research contributes to the development of AI-assisted tools for surgical reporting and analysis
Series This talk is part of the Foundation AI series.
Included in Lists
- All Talks (aka the CURE list)
- Artificial Intelligence Research Group Talks (Computer Laboratory)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Department of Computer Science and Technology talks and seminars
- Guy Emerson's list
- Hanchen DaDaDash
- Interested Talks
- Lecture Theatre 2, Computer Laboratory, William Gates Building
- Martin's interesting talks
- ndk22's list
- ob366-ai4er
- PhD related
- rp587
- School of Technology
- Speech Seminars
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Friday 21 March 2025, 17:00-17:45