Getting Started with VLM3D Challenge

CT-RATE Dataset Download Guide

Welcome to the VLM3D Challenge!
All tasks depend on CT-RATE — the first large-scale multimodal chest-CT dataset pairing 3-D volumes with free-text reports, multi-abnormality labels, and rich metadata.


1 · Prerequisites

Requirement Purpose Install Command
Python ≥ 3.8 Execution environment – (verify python --version)
huggingface_hub Authentication + API download pip install --upgrade huggingface_hub
datasets Easy split loading (optional) pip install --upgrade datasets

2 · Request Access to CT-RATE

  1. Open https://huggingface.co/datasets/ibrahimhamamci/CT-RATE
  2. Click “Access repository”
  3. Provide Name, Institution, Email and accept the Terms & Conditions
  4. Approval is usually instant

License — CC-BY-NC-SA 4.0 (non-commercial research only).
Cite the dataset in any publication (see Citation).


3 · Authenticate Once

huggingface-cli login
# paste your HF access token (Settings ▸ Access Tokens ▸ New Token)

4 · Download the Data

We maintain a restart-safe helper in https://github.com/sezginerr/example_download_script that automates authentication, parallel downloads, resumption, and integrity checks. Simply follow the instructions in that repository’s README to fetch the exact splits you need (train_fixed/, valid_fixed/, etc.). Manual Git-LFS setup is not recommended to download the dataset because of the size of the dataset.

4.2 Programmatic Access for CSV-Based Splits

If you only need textual labels, reports, or metadata (no 3-D volumes):

from datasets import load_dataset

labels = load_dataset(
    "ibrahimhamamci/CT-RATE",
    name="labels",                 # labels | reports | metadata
    split=["train", "validation"]
)
print(labels["train"][0])

5 · Directory Structure (overview)

CT-RATE/
├── anatomy_segmentation_labels/
├── metadata/                     # (v1 only)
├── multi_abnormality_labels/
├── radiology_text_reports/
├── train_fixed/    ⟂  corrected volumes — v2 (use these)
├── valid_fixed/    ⟂
└── vqa/                 # dialogs for CT-CHAT

Prefer v2 (*_fixed/); v1 volumes require intensity correction.


6 · Common Pitfalls & Fixes

Issue Solution
Low-contrast / black volumes You opened an un-corrected v1 scan. Use *_fixed/ or apply:
vox = raw * RescaleSlope + RescaleIntercept (values in metadata CSV).
Non-chest volumes Skip IDs in no_chest_train.txt / no_chest_valid.txt.
NaN z-spacing warning Three affected scans listed in data_correction_note.md — assign a valid spacing before resampling.

7 · Citation

@misc{hamamci2024foundation,
      title={Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography}, 
      author={Ibrahim Ethem Hamamci and Sezgin Er and Furkan Almas and Ayse Gulnihan Simsek and Sevval Nil Esirgun and Irem Dogan and Muhammed Furkan Dasdelen and Omer Faruk Durugol and Bastian Wittmann and Tamaz Amiranashvili and Enis Simsar and Mehmet Simsar and Emine Bensu Erdemir and Abdullah Alanbay and Anjany Sekuboyina and Berkan Lafci and Christian Bluethgen and Mehmet Kemal Ozdemir and Bjoern Menze},
      year={2024},
      eprint={2403.17834},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2403.17834}, 
}


@inproceedings{hamamci2024generatect,
      title={Generatect: Text-conditional generation of 3d chest ct volumes},
      author={Hamamci, Ibrahim Ethem and Er, Sezgin and Sekuboyina, Anjany and Simsar, Enis and Tezcan, Alperen and Simsek, Ayse Gulnihan and Esirgun, Sevval Nil and Almas, Furkan and Do{\u{g}}an, Irem and Dasdelen, Muhammed Furkan and others},
      booktitle={European Conference on Computer Vision},
      pages={126--143},
      year={2024},
      organization={Springer}
}


@inproceedings{hamamci2024ct2rep,
      title={Ct2rep: Automated radiology report generation for 3d medical imaging},
      author={Hamamci, Ibrahim Ethem and Er, Sezgin and Menze, Bjoern},
      booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
      pages={476--486},
      year={2024},
      organization={Springer}
}

8 · Need Help?

  • For technical issues, please start a discussion in the relevant task forum or open an issue on the task page.
  • For other matters, contact the organizers of the relevant task via the Email Organizers link on its page.

Happy experimenting — we look forward to your submissions! — The VLM3D Challenge Organizing Team