VLM3D Challenge
Vision-language modelling in 3d medical imaging
Welcome to the VLM3D Challenge!
Challenge Finals and Presentations → MICCAI 2025
Workshop → ICCV 2025
Submit your paper to our ICCV workshop!


Challenge Tasks
Task 1: Radiology Report Generation
- Participants build vision‑language models that translate a complete 3‑D chest CT scan into a free‑text radiology report covering findings and impression; performance is assessed with standard NLG scores (BLEU, METEOR, ROUGE‑L) plus the clinically aware CRG metric, using the CT‑RATE dataset split into public train/validation and hidden internal+external test sets.
Task 2: Multi-Abnormality Classification
- Given a volumetric chest CT, algorithms must output an 18‑length binary vector indicating the presence of common thoracic conditions (e.g., pleural effusion, lung nodule); evaluation on hidden test cohorts combines AUROC, F1, Precision, Recall, and Accuracy, with point‑based ranking driven by permutation‑test wins.
Task 3: Self-Supervised Multi-Abnormality Localization
- Without voxel‑level labels during training, systems must localize five key pathologies—pericardial effusion, pleural effusion, consolidation, ground glass opacity, and lung nodule—producing 3‑D heat‑maps that are scored on Dice, IoU, Hausdorff‑95, and Sensitivity against expert masks for 2000 hidden test scans.
Task 4: Text-Conditional CT Generation
- Participants synthesize anatomically plausible 3‑D chest CT volumes directly from radiology text prompts, aiming for high visual fidelity, realistic Hounsfield distributions, and tight semantic alignment; success is measured with CT‑adapted generative metrics (FVDI3D, FVDCT‑Net, CT‑CLIP, FID) and ranked via the same permutation‑based point system.