Getting Started with VLM3D Challenge

CT-RATE Dataset Download Guide¶

Welcome to the VLM3D Challenge!
All tasks depend on CT-RATE — the first large-scale multimodal chest-CT dataset pairing 3-D volumes with free-text reports, multi-abnormality labels, and rich metadata.

1 · Prerequisites¶

Requirement	Purpose	Install Command
Python ≥ 3.8	Execution environment	– (verify `python --version`)
`huggingface_hub`	Authentication + API download	`pip install --upgrade huggingface_hub`
`datasets`	Easy split loading (optional)	`pip install --upgrade datasets`

2 · Request Access to CT-RATE¶

Open https://huggingface.co/datasets/ibrahimhamamci/CT-RATE
Click “Access repository”
Provide Name, Institution, Email and accept the Terms & Conditions
Approval is usually instant

License — CC-BY-NC-SA 4.0 (non-commercial research only).
Cite the dataset in any publication (see Citation).

3 · Authenticate Once¶

huggingface-cli login
# paste your HF access token (Settings ▸ Access Tokens ▸ New Token)

4 · Download the Data¶

4.1 Use the Official Helper Script (recommended)¶

We maintain a restart-safe helper in https://github.com/sezginerr/example_download_script that automates authentication, parallel downloads, resumption, and integrity checks. Simply follow the instructions in that repository’s README to fetch the exact splits you need (train_fixed/, valid_fixed/, etc.). Manual Git-LFS setup is not recommended to download the dataset because of the size of the dataset.

4.2 Programmatic Access for CSV-Based Splits¶

If you only need textual labels, reports, or metadata (no 3-D volumes):

from datasets import load_dataset

labels = load_dataset(
    "ibrahimhamamci/CT-RATE",
    name="labels",                 # labels | reports | metadata
    split=["train", "validation"]
)
print(labels["train"][0])

5 · Directory Structure (overview)¶

CT-RATE/
├── anatomy_segmentation_labels/
├── metadata/                     # (v1 only)
├── multi_abnormality_labels/
├── radiology_text_reports/
├── train_fixed/    ⟂  corrected volumes — v2 (use these)
├── valid_fixed/    ⟂
└── vqa/                 # dialogs for CT-CHAT

Prefer v2 (*_fixed/); v1 volumes require intensity correction.

6 · Common Pitfalls & Fixes¶

Issue	Solution
Low-contrast / black volumes	You opened an un-corrected v1 scan. Use `_fixed/` or apply: `vox = raw RescaleSlope + RescaleIntercept` (values in metadata CSV).
Non-chest volumes	Skip IDs in `no_chest_train.txt` / `no_chest_valid.txt`.
NaN z-spacing warning	Three affected scans listed in data_correction_note.md — assign a valid spacing before resampling.

7 · Citation¶

@misc{hamamci2024foundation,
      title={Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography}, 
      author={Ibrahim Ethem Hamamci and Sezgin Er and Furkan Almas and Ayse Gulnihan Simsek and Sevval Nil Esirgun and Irem Dogan and Muhammed Furkan Dasdelen and Omer Faruk Durugol and Bastian Wittmann and Tamaz Amiranashvili and Enis Simsar and Mehmet Simsar and Emine Bensu Erdemir and Abdullah Alanbay and Anjany Sekuboyina and Berkan Lafci and Christian Bluethgen and Mehmet Kemal Ozdemir and Bjoern Menze},
      year={2024},
      eprint={2403.17834},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2403.17834}, 
}


@inproceedings{hamamci2024generatect,
      title={Generatect: Text-conditional generation of 3d chest ct volumes},
      author={Hamamci, Ibrahim Ethem and Er, Sezgin and Sekuboyina, Anjany and Simsar, Enis and Tezcan, Alperen and Simsek, Ayse Gulnihan and Esirgun, Sevval Nil and Almas, Furkan and Do{\u{g}}an, Irem and Dasdelen, Muhammed Furkan and others},
      booktitle={European Conference on Computer Vision},
      pages={126--143},
      year={2024},
      organization={Springer}
}


@inproceedings{hamamci2024ct2rep,
      title={Ct2rep: Automated radiology report generation for 3d medical imaging},
      author={Hamamci, Ibrahim Ethem and Er, Sezgin and Menze, Bjoern},
      booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
      pages={476--486},
      year={2024},
      organization={Springer}
}

8 · Need Help?¶

For technical issues, please start a discussion in the relevant task forum or open an issue on the task page.
For other matters, contact the organizers of the relevant task via the Email Organizers link on its page.

Happy experimenting — we look forward to your submissions! — The VLM3D Challenge Organizing Team