OCR on image

#28
by glitchyordis - opened

Obtaining key information is quite straightforward but Is there a way to obtain bbox locations from texts detected?

glitchyordis changed discussion title from OCR text to OCR on image

You can prompt the model to return bbox locations (see here: https://huggingface.co/spaces/maxiw/Qwen2-VL-Detection). I also tried "detect all texts" but the results are not super precise.

Sign up or log in to comment