Vision-Language Modeling in 3D Medical Imaging (VLM3D) Workshops

ICCV 2025 – Full-Day Workshop

Date & Venue: ICCV 2025, Honolulu, Hawaii
Format: invited talks · paper track (ICCV proceedings) · live benchmark reveal & poster session

➜ Submit your paper


Why VLM3D?

VLM3D is the first ICCV workshop devoted entirely to vision-language methods for volumetric (3D) medical data. Our goal is to create a forum where computer-vision, NLP, and clinical-AI researchers can:

  • share state-of-the-art techniques for 3D report generation, abnormality reasoning, and generative modelling;
  • discuss open problems—efficient volumetric representation learning, clinical grounding, and trustworthy evaluation;
  • build new collaborations that accelerate translation of multimodal AI into radiology practice.

Confirmed Speakers

Prof. Bjoern Menze is a professor at the University of Zurich and a leading expert in biomedical image analysis. Formerly a W3 professor at TU Munich, he has held positions at Inria, ETH Zurich, MIT, and Harvard. His work has earned awards such as the MICCAI Best Paper and Young Scientist Impact Award. Recently, he has pioneered 3D vision-language modeling in medical imaging with large-scale efforts like CT-RATE, CT-CLIP, and GenerateCT.

Prof. Björn Ommer is a professor at Ludwig Maximilian University of Munich, where he leads the Computer Vision & Learning Group. He is renowned for his work in generative AI, notably as a co-developer of Stable Diffusion, a widely used open-source text-to-image model. His research encompasses semantic image understanding, visual synthesis, and explainable AI. Ommer's contributions have been recognized with the 2024 German AI Prize and the Eduard Rhein Foundation Technology Prize. His work has applications in various domains, including medical imaging, where it aids in the automated analysis of medical image data

Dr. Daguang Xu leads healthcare AI research at NVIDIA’s AI-Infra group, focusing on 3D medical imaging, EHR mining, and vision-language modeling. He has co-led major open-source projects like MONAI and developed influential 3D models such as UNETR and MAISI. With 90+ peer-reviewed papers and ~50 patents, his work bridges cutting-edge 3D AI and clinical impact.

Prof. Akshay Chaudhari is an Assistant Professor of Radiology and Biomedical Data Science at Stanford University and Interim Division Chief of Integrative Biomedical Imaging Informatics. He leads the Machine Intelligence in Medical Imaging (MIMI) group, developing multimodal foundation models and physics-guided AI techniques that transform both image acquisition and analysis across vision, language, and EHR data. He co-founded Cognita, a company building next-generation multimodal AI systems to deliver fast, trustworthy diagnostics for radiology workflows.

Prof. Pranav Rajpurkar is an Associate Professor at Harvard University’s Department of Biomedical Informatics. He designs algorithms and curates datasets to advance trustworthy, clinician-level AI across medical imaging, clinical text, and electronic health records. He co-founded a2z Radiology AI, a company developing comprehensive diagnostic-imaging systems that serve as an AI safety net for radiologists. Rajpurkar also co-hosts The AI Health Podcast, edits the Doctor Penguin AI Health Newsletter, and teaches the “AI for Medicine” Coursera series and the AI for Healthcare Bootcamp.

Dr. Fernando Pérez-García is a Senior Research Engineer at Microsoft Research Health Futures. His work focuses on vision–language foundation models for healthcare and their translation to clinical practice. Prior to joining Microsoft, he was at the Centre for Neuroimaging at the Paris Brain Institute, building histological and MRI brain atlases for deep brain stimulation. He then moved on to UCL and King’s College London for his PhD in Medical Imaging, where he investigated the potential of AI to improve the treatment of epilepsy, developing open-source software tools, such as TorchIO, in the meantime.


Call for Papers

We welcome novel research on multimodal or language-grounded analysis of 3D medical data, including—but not limited to—model architecture, self-supervised learning, evaluation, and clinical translation.

  • Submission: 4–8 page ICCV style, single-blind
  • Proceedings: accepted papers published in the official ICCV 2025 Workshop volume
  • Review: handled by the organising committee

Key Dates (all 23:59 EST)

DateEvent
26 May 2025Submission opens
06 Jul 2025Submission deadline
10 Jul 2025Notification
20 Jul 2025Camera-ready deadline

➜ Submit your paper


MICCAI 2025 – Full-Day Challenge Workshop

Details will be announced