Segmentation Issue with identifying multiples, or contiguous objects

#10
by EnragedAntelope - opened

Hi, great work on this!
The captioning works well.
When it comes to segmentation, I believe there's a potential issue with identifying multiples of things, or objectsthat are contiguous but interrupted in the photo by feet/legs/etc.

I am using in ComfyUI but the base model is direct from this huggingface repo.
For a test, you can see I selected wings and then dogs, but I got only 1 of those objects returned:

image.png

image.webp

And here we have it compared to groundingdino segmentation, and you can see the floor is only half-identified in top masking (Florence) vs the bottom masking (groundingdino):
image-1.webp

Just wanted to raise this as a potential improvement point. Thank you for your help and time.

Microsoft org

hi @TardyTurtle , thanks for raising the point. One workaround to get segmentation results for multiple objects is to use two step approach. First, you could get the boxes of the objects. then for each box, region to segmentation could be utilized to get the mask results.

@haipingwu Hi, I am wondering if it is possible to fine tune the model to get segmentation results for multiple objects? Thank you!

@eternalaudrey ,Were you able to generate masks for multiple objects?
If yes, could you tell me how you did it?

@eternalaudrey ,Were you able to generate masks for multiple objects?
If yes, could you tell me how you did it?

Hi, unfortunately not. For the moment I can just do object detection task and generate a series of bounding boxes. Then run a loop to generate the masks using the regions (bbox) iteratively, which is much inefficient.

Sign up or log in to comment