BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Surgical data using LLMs - Hugo Georgenthum
DTSTART:20250321T170000Z
DTEND:20250321T174500Z
UID:TALK229627@talks.cam.ac.uk
CONTACT:Pietro Lio
DESCRIPTION:The automatic summarization of surgical videos is crucial for 
 improving procedural documentation\, surgical training\, and post-operativ
 e analysis. This thesis\npresents a new method at the intersection of arti
 ficial intelligence and medicine\,\nseeking to develop innovative machine-
 learning models with real-world applications in surgery. To this end\, we 
 propose a multi-modal approach to generate\nvideo summaries by benefiting 
 from the latest improvements in both computer\nvision and large language m
 odels. For instance\, the model processes surgical\nvideos in the 3 follow
 ing key steps. After dividing the video into clips\, the focus\nis on the 
 extraction of visual features\, by treating the clips on a frame level\nwi
 th visual transformers. The goal is to detect the tools\, organs\, tissues
  and\nactions performed by the surgeon. These visual features are then tra
 nslated to\nframe captions using large language models. Subsequently\, on 
 the video level\,\nthe emphasis is placed on the temporal features. The la
 tter are obtained with a\nVivit-based encoder by taking as input both the 
 clips and the frame captions extracted earlier. In an analogous way to the
  frame captions\, the temporal features\nare converted into clip captions\
 , which capture the overall context of the clip.\nThe last phase gathers t
 he combination of the clip descriptions into a surgical\nreport with an LL
 M specifically designed for this task. We train and evaluate\nour model on
  the CholecT50 dataset\, leveraging instrument and action frame annotation
 s along 50 laparoscopic videos. Experimental results demonstrate that\nour
  method produces coherent and contextually meaningful summaries\, with a\n
 96% precision for tool detection and 0.74% Bert score for temporal context
  extraction. This research contributes to the development of AI-assisted t
 ools for\nsurgical reporting and analysis
LOCATION:Lecture Theatre 2\, Computer Laboratory\, William Gates Building
END:VEVENT
END:VCALENDAR
