Dear Sezgin,
Thank you for correcting the earlier confusion.
We understand that the bounding boxes are provided in millimeter format, and that the integer values are due to annotation in the mm space.
However, after interpreting the annotations as being in millimeters and visualizing the example ground-truth bounding boxes, the results looked questionable, so I would like to ask once again for clarification.
To visualize the ground-truth bounding boxes corresponding to pleural effusion in the file f13978c0-b141-4893-b68f-be83bc612901.mha
, I followed the steps below for the two boxes [95, 333, 67, 127, 67, 90]
and [273, 338, 55, 124, 54, 111]
.
- Given the spacing
sx = 0.853515625 mm, sy = 0.853515625 mm, sz = 1.5 mm
, the two bounding boxes can be converted into voxel indices as follows:
- First box: [95/sx, 333/sy, 67/sz, 127/sx, 67/sy, 90/sz] ~=
[111, 390, 44, 148, 78, 60]
- Second box: [273/sx, 338/sy, 55/sz, 124/sx, 54/sy, 111/sz] ~=
[319, 396, 36, 145, 63, 74]
- Based on the Submission Guideline indicating that the Z-axis was flipped during labeling, I re-flipped the bounding box z-coordinates. Given that the array read with SimpleITK (
ReadImage
→ GetArrayFromImage
) has shape (242, 512, 512), I converted each z-coordinate to 242 – (z + dz)
, yielding the following:
- First box:
[111, 390, 138, 148, 78, 60]
- Second box:
[319, 396, 132, 145, 63, 74]
- To visualize the overlap along the common z-axis, I selected
z=140
and drew the bounding boxes on the slice. For the array img
with shape (242, 512, 512), I plotted on img[140]
:
- a box starting at
(x, y) = (111, 390)
with size (dx, dy) = (148, 78)
- and another box starting at
(x, y) = (319, 396)
with size (dx, dy) = (145, 63)
.
The corresponding code and visualization result are shown below.
from PIL import Image, ImageDraw
import numpy as np
import SimpleITK as sitk
mha_file = "example_gt_data/abnormality_localization_example/f13978c0-b141-4893-b68f-be83bc612901.mha"
itk = sitk.ReadImage(mha_file)
sx, sy, sz = itk.GetSpacing()
img = sitk.GetArrayFromImage(itk).astype("float32")
img = np.clip(img, -1000, 1000)
img = 255 * ((img / 1000.0) + 1) / 2
boxes = [[95, 333, 67, 127, 67, 90], [273, 338, 55, 124, 54, 111]]
flip_idx_boxes = []
for box in boxes:
x, y, z, dx, dy, dz = box
new_box = [int(x / sx), int(y / sy), int(z / sz), int(dx / sx), int(dy / sy), int(dz / sz)]
flipped_z = img.shape[0] - (new_box[2] + new_box[5])
new_box[2] = flipped_z
flip_idx_boxes.append(new_box)
z = 140
slice_img = Image.fromarray(img[z].astype(np.uint8)).convert("RGB")
draw = ImageDraw.Draw(slice_img)
for (x, y, z0, dx, dy, dz) in flip_idx_boxes:
draw.rectangle([x, y, x + dx, y + dy], outline="red", width=2)
slice_img.save("image.png")
The boxes drawn this way appear to be in awkward positions for representing pleural effusion bounding boxes.
In contrast, if we interpret the given bounding boxes directly as voxel indices without converting to millimeters and redraw them, the result looks more plausible. Removing the mm conversion part (replacing new_box = [int(x / sx), int(y / sy), int(z / sz), int(dx / sx), int(dy / sy), int(dz / sz)]
with new_box = box
) yields the following image.
(Since my attempt to upload images via drag-and-drop did not succeed, I have attached the images via external links instead. Please click the links above to view them.)
The latter appears to better correspond to the bounding boxes of pleural effusion, which was the reason I initially raised the question. Nevertheless, I would appreciate it if you could confirm whether the first visualization in millimeter coordinates is indeed correct, or if there was any mistake in the procedure I followed.
Thank you in advance for your time in reviewing this issue and the linked images.