Questions on bbox units (voxel indices vs physical mm)

Questions on bbox units (voxel indices vs physical mm) ¶

By: kychoi on Aug. 21, 2025, 6:13 a.m.

Dear Organizers,

In the 'abnclass_example_docker' code, the model output is converted from index space to physical space using _idx_to_phys, and the evaluation script compares that saved JSON directly with the ground-truth boxes.

However, in the provided example gt data (f13978c0-b141-4893-b68f-be83bc612901.mha): the .mha file has spacing (sx=0.853515625, sy=0.853515625, sz=1.5), but the ground-truth boxes are integer lists such as [95, 333, 67, 127, 67, 90] and [201, 282, 41, 5, 7, 4], which look like voxel indices rather than physical coordinates in millimeters.

Could you please clarify:

What units are the official ground-truth boxes in? Are they physical (mm) or voxel index space? If they are in mm, are the integer values in the sample rounded from non-integers?
If the sample ground truth is indeed index space, is that specific bbox_ground_truth.csv file an anomaly, or are all evaluation labels provided in index space? In that case, should we skip _idx_to_phys and submit boxes in index coordinates?

Thanks in advance for clarifying.

Last edited by: kychoi on Aug. 21, 2025, 9 a.m., edited 1 time in total.

Re: Questions on bbox units (voxel indices vs physical mm) ¶

By: vlm3dchallenge on Aug. 21, 2025, 2:43 p.m.

Hi @kyochi,

The values are in raw voxel space. You don't need to convert them to mm.

Best, Sezgin

Re: Questions on bbox units (voxel indices vs physical mm) ¶

By: kychoi on Aug. 22, 2025, 2:27 a.m.

Dear Sezgin,

Thank you for the quick clarification that the values are in raw voxel space and we don’t need to convert to mm. To make sure our submission matches the guidelines, could you please confirm two points:

Submission JSON key: The submission format specifies a field named "bbox_mm". Can you confirm that, despite the name, this field should contain voxel-space indices [x, y, z, dx, dy, dz] (integers), not millimeters?
Evaluation metric description: In the guidelines, the description of the evaluation metric states: “Distance – centroid distance (mm) between matched boxes.” If I understood your reply correctly, the description is misleading, as the centroid distance is computed in voxel space rather than millimeters. Did I get that right?

The model output values will differ depending on whether the bounding boxes are expressed in millimeters or in voxel space, and since this can significantly affect the computed metric values, this is a very important and critical issue. Both the Submission Guidelines and the abnloc_example_docker code seem to imply that model output bounding boxes should be in millimeters. For participants who may not have seen this forum post, it would be helpful to update the Submission Guidelines or issue a separate notice clearly stating that model output bounding boxes should not be converted to millimeters but instead submitted as voxel-space coordinates.

We sincerely appreciate your efforts.

Last edited by: kychoi on Aug. 22, 2025, 3:12 a.m., edited 1 time in total.

Re: Questions on bbox units (voxel indices vs physical mm) ¶

By: vlm3dchallenge on Aug. 24, 2025, 1 a.m.

Hi Again, Sorry for the misunderstanding. I’ve also been separately running some of the errored submissions, which is why my response was delayed. I’ve reviewed the annotation tool, and here’s the clarification: we resize the volumes to millimeters before annotating the bounding boxes, so all bounding boxes are in millimeter format. Apologies again for my earlier message, you can ignore that. The reason all values appear as integers is because the annotations are performed on the mm-converted volumes. I hope this helps.

Best, Sezgin

Re: Questions on bbox units (voxel indices vs physical mm) ¶

By: kychoi on Aug. 25, 2025, 7:42 a.m.

Dear Sezgin,

Thank you for correcting the earlier confusion. We understand that the bounding boxes are provided in millimeter format, and that the integer values are due to annotation in the mm space.

However, after interpreting the annotations as being in millimeters and visualizing the example ground-truth bounding boxes, the results looked questionable, so I would like to ask once again for clarification.

To visualize the ground-truth bounding boxes corresponding to pleural effusion in the file f13978c0-b141-4893-b68f-be83bc612901.mha, I followed the steps below for the two boxes [95, 333, 67, 127, 67, 90] and [273, 338, 55, 124, 54, 111].

Given the spacing sx = 0.853515625 mm, sy = 0.853515625 mm, sz = 1.5 mm, the two bounding boxes can be converted into voxel indices as follows:
- First box: [95/sx, 333/sy, 67/sz, 127/sx, 67/sy, 90/sz] ~= [111, 390, 44, 148, 78, 60]
- Second box: [273/sx, 338/sy, 55/sz, 124/sx, 54/sy, 111/sz] ~= [319, 396, 36, 145, 63, 74]
Based on the Submission Guideline indicating that the Z-axis was flipped during labeling, I re-flipped the bounding box z-coordinates. Given that the array read with SimpleITK (ReadImage → GetArrayFromImage) has shape (242, 512, 512), I converted each z-coordinate to 242 – (z + dz), yielding the following:
- First box: [111, 390, 138, 148, 78, 60]
- Second box: [319, 396, 132, 145, 63, 74]
To visualize the overlap along the common z-axis, I selected z=140 and drew the bounding boxes on the slice. For the array img with shape (242, 512, 512), I plotted on img[140]:
- a box starting at (x, y) = (111, 390) with size (dx, dy) = (148, 78)
- and another box starting at (x, y) = (319, 396) with size (dx, dy) = (145, 63).

The corresponding code and visualization result are shown below.

from PIL import Image, ImageDraw
import numpy as np
import SimpleITK as sitk

mha_file = "example_gt_data/abnormality_localization_example/f13978c0-b141-4893-b68f-be83bc612901.mha"
itk  = sitk.ReadImage(mha_file)
sx, sy, sz = itk.GetSpacing()
img  = sitk.GetArrayFromImage(itk).astype("float32")
img  = np.clip(img, -1000, 1000)
img = 255 * ((img / 1000.0) + 1) / 2

boxes = [[95, 333, 67, 127, 67, 90], [273, 338, 55, 124, 54, 111]]
flip_idx_boxes = []
for box in boxes:
    x, y, z, dx, dy, dz = box
    new_box = [int(x / sx), int(y / sy), int(z / sz), int(dx / sx), int(dy / sy), int(dz / sz)]
    flipped_z = img.shape[0] - (new_box[2] + new_box[5])
    new_box[2] = flipped_z
    flip_idx_boxes.append(new_box)

z = 140
slice_img = Image.fromarray(img[z].astype(np.uint8)).convert("RGB")
draw = ImageDraw.Draw(slice_img)
for (x, y, z0, dx, dy, dz) in flip_idx_boxes:
    draw.rectangle([x, y, x + dx, y + dy], outline="red", width=2)
slice_img.save("image.png")

The boxes drawn this way appear to be in awkward positions for representing pleural effusion bounding boxes.

In contrast, if we interpret the given bounding boxes directly as voxel indices without converting to millimeters and redraw them, the result looks more plausible. Removing the mm conversion part (replacing new_box = [int(x / sx), int(y / sy), int(z / sz), int(dx / sx), int(dy / sy), int(dz / sz)] with new_box = box) yields the following image.

(Since my attempt to upload images via drag-and-drop did not succeed, I have attached the images via external links instead. Please click the links above to view them.)

The latter appears to better correspond to the bounding boxes of pleural effusion, which was the reason I initially raised the question. Nevertheless, I would appreciate it if you could confirm whether the first visualization in millimeter coordinates is indeed correct, or if there was any mistake in the procedure I followed.

Thank you in advance for your time in reviewing this issue and the linked images.

Last edited by: kychoi on Aug. 25, 2025, 9:34 a.m., edited 2 times in total.

Re: Questions on bbox units (voxel indices vs physical mm) ¶

By: vlm3dchallenge on Aug. 25, 2025, 10:07 a.m.

Hi kyochi,

This: Based on the Submission Guideline indicating that the Z-axis was flipped during labeling, I re-flipped the bounding box z-coordinates. Given that the array read with SimpleITK (ReadImage → GetArrayFromImage) has shape (242, 512, 512), I converted each z-coordinate to 242 – (z + dz), yielding the following:

seems to be in px space. You should calculate the corrected z-coordinate on mm as well (I mean z_max should be mm as well).

Does that fix your issue?

Best, Sezgin

Last edited by: vlm3dchallenge on Aug. 25, 2025, 10:07 a.m., edited 1 time in total.

Re: Questions on bbox units (voxel indices vs physical mm) ¶

By: kychoi on Aug. 25, 2025, 2:31 p.m.

Dear Sezgin,

Thank you for the prompt reply. I understand your point; however, in my case I consistently executed the z-axis flip and the subsequent visualization process in px space rather than converting to mm. Unifying the representation in px space, rather than mm space, was the logical choice because the visualization (using PIL.Image.fromarray) inherently depends on pixel-based coordinates. Specifically, I converted the bounding box from mm space (e.g., [95, 333, 67, 127, 67, 90]) to px space (e.g., [111, 390, 44, 148, 78, 60]), and then applied the z-axis flip in px space. Therefore, I believe z_max should be defined as 242 px (which corresponds to 363 mm when considering the 1.5 mm z-spacing), rather than directly using 363 mm. Please let me know if there is a flaw in my reasoning.

Best, kychoi

Last edited by: kychoi on Aug. 25, 2025, 3 p.m., edited 7 times in total.