Request for Support: Inference Failure Due to Memory Issues

Request for Support: Inference Failure Due to Memory Issues  

  By: Sunggu on Aug. 4, 2025, 7:42 a.m.

I am currently unable to run inference due to memory-related issues. According to your manual, I should reach out for assistance in such cases. I have already sent several emails, but unfortunately, I have not received any response so far.

The inference runs without any problem on my local environment, but the error occurs during the challenge try-out.

Could you please advise me on how to adjust the GPU and memory settings to resolve this issue?

Thank you very much for your support.

Re: Request for Support: Inference Failure Due to Memory Issues  

  By: jiawei on Aug. 5, 2025, 3:06 a.m.

Could you please tell me the status of your import? Is it imported successfully and can the image be loaded and inferred normally? We are still in the inactive state. Thank you very much.

Re: Request for Support: Inference Failure Due to Memory Issues  

  By: Sunggu on Aug. 5, 2025, 5:43 a.m.

"The algorithm failed on one or more cases", and unfortunately, I am unable to access any additional logs, which leaves me rather confused.

As for the container image you mentioned, "the Algorithm "Image ab56245b has been successfully imported and is currently in an "Active" state."

However, the issue is that the shared memory (shm) and pids_limit are extremely small, making it impossible to run inference successfully.

I have sent multiple emails regarding this matter but have not received any response so far. At this point, I am not even sure whether there is an administrator actively managing this platform.

Re: Request for Support: Inference Failure Due to Memory Issues  

  By: vlm3dchallenge on Aug. 11, 2025, 8:56 a.m.

Hi All,

We are running failed images in an external cluster on H100 GPUs. If your algorithm requires more than that with model parallelism etc. please e-mail me directly: [email protected]. I'll help you with multigpu inference. @Sunggu, I think we already discussed your issue.

Best, Sezgin