microsoft/Florence-2-large · Segmentation Issue with identifying multiples, or contiguous objects

Jun 20

Hi, great work on this!
The captioning works well.
When it comes to segmentation, I believe there's a potential issue with identifying multiples of things, or objectsthat are contiguous but interrupted in the photo by feet/legs/etc.

I am using in ComfyUI but the base model is direct from this huggingface repo.
For a test, you can see I selected wings and then dogs, but I got only 1 of those objects returned:

And here we have it compared to groundingdino segmentation, and you can see the floor is only half-identified in top masking (Florence) vs the bottom masking (groundingdino):

Just wanted to raise this as a potential improvement point. Thank you for your help and time.

haipingwu

Microsoft org Jun 21

hi @TardyTurtle , thanks for raising the point. One workaround to get segmentation results for multiple objects is to use two step approach. First, you could get the boxes of the objects. then for each box, region to segmentation could be utilized to get the mask results.

eternalaudrey

Jul 9

@haipingwu Hi, I am wondering if it is possible to fine tune the model to get segmentation results for multiple objects? Thank you!

nanduuuuuuuuuuu

Jul 20

@eternalaudrey ,Were you able to generate masks for multiple objects?
If yes, could you tell me how you did it?

eternalaudrey

Jul 31

@eternalaudrey ,Were you able to generate masks for multiple objects?
If yes, could you tell me how you did it?

Hi, unfortunately not. For the moment I can just do object detection task and generate a series of bounding boxes. Then run a loop to generate the masks using the regions (bbox) iteratively, which is much inefficient.