Tonic commited on
Commit
02ff46f
1 Parent(s): cdb9801

small-changes (#4)

Browse files

- fix: add gitignore (aa072c3f7329021d73ac453214afb36c0fb6b607)
- refactor: make title and description easier to use (b153fc483e36535661acc93100eb0cc7df3702d0)
- refactor: retrieve title and desc from markdown, improve UI for more responsive usage (bbed54bdb2b60897319ab9dca2d33c011804888c)
- fix: add markdown for processing and downgrade numpy to stop erroring out (d50f832e1117c81919fc6677575d8ab2545a1e61)

Files changed (3) hide show
  1. .gitignore +2 -0
  2. app.py +75 -91
  3. content/index.md +53 -0
.gitignore ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ .DS_Store
2
+ .venv/*
app.py CHANGED
@@ -5,87 +5,76 @@ import os
5
  import base64
6
  import spaces
7
  import io
8
- import tempfile
9
  from PIL import Image
10
- import io
11
-
12
-
13
- title = """# 🙋🏻‍♂️Welcome to Tonic's🫴🏻📸GOT-OCR"""
14
- description = """"
15
- The GOT-OCR model is a revolutionary step in the evolution of OCR systems, boasting 580M parameters and the ability to process various forms of "characters." It features a high-compression encoder and a long-context decoder, making it well-suited for both scene- and document-style images. The model also supports multi-page and dynamic resolution OCR for added practicality.
16
-
17
- The model can output results in a variety of formats, including plain text, markdown, and even complex outputs like TikZ diagrams or molecular SMILES strings. Interactive OCR allows users to specify regions of interest for OCR using coordinates or colors.
18
-
19
- ## Features
20
- - **Plain Text OCR**: Recognizes and extracts plain text from images.
21
- - **Formatted Text OCR**: Extracts text while preserving its formatting (tables, formulas, etc.).
22
- - **Fine-grained OCR**: Box-based and color-based OCR for precise text extraction from specific regions.
23
- - **Multi-crop OCR**: Processes multiple cropped regions within an image.
24
- - **Rendered Formatted OCR Results**: Outputs OCR results in markdown, TikZ, SMILES, or other formats with rendered formatting.
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- GOT-OCR-2.0 can handle:
27
- - Plain text
28
- - Math/molecular formulas
29
- - Tables
30
- - Charts
31
- - Sheet music
32
- - Geometric shapes
33
 
34
- ## How to Use
35
- 1. Select a task from the dropdown menu.
36
- 2. Upload an image.
37
- 3. (Optional) Fill in additional parameters based on the task.
38
- 4. Click **Process** to see the results.
39
- ---
40
- ### Join us :
41
- 🌟TeamTonic🌟 is always making cool demos! Join our active builder's 🛠️community 👻 [![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP) On 🤗Huggingface:[MultiTransformer](https://huggingface.co/MultiTransformer) On 🌐Github: [Tonic-AI](https://github.com/tonic-ai) & contribute to🌟 [Build Tonic](https://git.tonic-ai.com/contribute)🤗Big thanks to Yuvi Sharma and all the folks at huggingface for the community grant 🤗
42
- """
43
 
 
44
  model_name = 'ucaslcl/GOT-OCR2_0'
45
 
46
-
47
  tokenizer = AutoTokenizer.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True)
 
48
  model = AutoModel.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True, low_cpu_mem_usage=True, device_map='cuda', use_safetensors=True, pad_token_id=tokenizer.eos_token_id)
49
  model = model.eval().cuda()
50
  model.config.pad_token_id = tokenizer.eos_token_id
51
 
52
- def save_image_to_temp_file(image):
53
- with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as temp_file:
54
- image.save(temp_file, format="PNG")
55
- return temp_file.name
56
 
57
  @spaces.GPU
58
- def process_image(image, task, ocr_type=None, ocr_box=None, ocr_color=None):
59
- try:
60
- if image is None:
61
- return "No image provided", None
62
-
63
- temp_image_path = save_image_to_temp_file(image)
64
-
65
- if task == "Plain Text OCR":
66
- res = model.chat(tokenizer, temp_image_path, ocr_type='ocr')
67
- elif task == "Format Text OCR":
68
- res = model.chat(tokenizer, temp_image_path, ocr_type='format')
69
- elif task == "Fine-grained OCR (Box)":
70
- res = model.chat(tokenizer, temp_image_path, ocr_type=ocr_type, ocr_box=ocr_box)
71
- elif task == "Fine-grained OCR (Color)":
72
- res = model.chat(tokenizer, temp_image_path, ocr_type=ocr_type, ocr_color=ocr_color)
73
- elif task == "Multi-crop OCR":
74
- res = model.chat_crop(tokenizer, image_file=temp_image_path)
75
- elif task == "Render Formatted OCR":
76
- res = model.chat(tokenizer, temp_image_path, ocr_type='format', render=True, save_render_file='./results/demo.html')
77
- with open('./results/demo.html', 'r') as f:
78
- html_content = f.read()
79
- os.remove(temp_image_path)
80
- return res, html_content
81
-
82
- # Clean up
83
- os.remove(temp_image_path)
84
-
85
- return res, None
86
- except Exception as e:
87
- return str(e), None
88
 
 
 
89
  def update_inputs(task):
90
  if task == "Plain Text OCR" or task == "Format Text OCR" or task == "Multi-crop OCR":
91
  return [gr.update(visible=False)] * 4
@@ -105,22 +94,25 @@ def update_inputs(task):
105
  ]
106
  elif task == "Render Formatted OCR":
107
  return [gr.update(visible=False)] * 3 + [gr.update(visible=True)]
108
-
109
 
110
  def ocr_demo(image, task, ocr_type, ocr_box, ocr_color):
111
- result = process_image(image, task, ocr_type, ocr_box, ocr_color)
112
- if isinstance(result, tuple) and len(result) == 2:
113
- res, html_content = result
114
- if html_content:
115
- return res, html_content
116
- return result, None
117
 
118
  with gr.Blocks() as demo:
119
- gr.Markdown(title)
120
- gr.Markdown(description)
121
  with gr.Row():
122
- with gr.Column():
123
- image_input = gr.Image(type="pil", label="Input Image")
 
 
 
 
 
 
124
  task_dropdown = gr.Dropdown(
125
  choices=[
126
  "Plain Text OCR",
@@ -153,27 +145,19 @@ with gr.Blocks() as demo:
153
  visible=False
154
  )
155
  submit_button = gr.Button("Process")
156
-
157
- with gr.Column():
158
  output_text = gr.Textbox(label="OCR Result")
159
  output_html = gr.HTML(label="Rendered HTML Output")
160
 
161
- gr.Markdown("""## GOT-OCR 2.0
162
-
163
- This small **330M parameter** model powerful OCR model can handle various text recognition tasks with high accuracy.
164
-
165
- ### Model Information
166
- - **Model Name**: GOT-OCR 2.0
167
- - **Hugging Face Repository**: [ucaslcl/GOT-OCR2_0](https://huggingface.co/ucaslcl/GOT-OCR2_0)
168
- - **Environment**: CUDA 11.8 + PyTorch 2.0.1
169
- """)
170
-
171
  task_dropdown.change(
172
  update_inputs,
173
  inputs=[task_dropdown],
174
  outputs=[ocr_type_dropdown, ocr_box_input, ocr_color_dropdown, render_checkbox]
175
  )
176
 
 
177
  submit_button.click(
178
  ocr_demo,
179
  inputs=[image_input, task_dropdown, ocr_type_dropdown, ocr_box_input, ocr_color_dropdown],
@@ -181,4 +165,4 @@ with gr.Blocks() as demo:
181
  )
182
 
183
  if __name__ == "__main__":
184
- demo.launch()
 
5
  import base64
6
  import spaces
7
  import io
 
8
  from PIL import Image
9
+ import numpy as np
10
+ import yaml
11
+ import markdown
12
+ from pathlib import Path
13
+
14
+ # Function to extract title and description from the markdown file
15
+ def extract_title_description(md_file_path):
16
+ with open(md_file_path, 'r') as f:
17
+ lines = f.readlines()
18
+
19
+ # Extract frontmatter (YAML) for title
20
+ frontmatter = []
21
+ content_start = 0
22
+ if lines[0].strip() == '---':
23
+ for idx, line in enumerate(lines[1:], 1):
24
+ if line.strip() == '---':
25
+ content_start = idx + 1
26
+ break
27
+ frontmatter.append(line)
28
+
29
+ frontmatter_yaml = yaml.safe_load(''.join(frontmatter))
30
+ title = frontmatter_yaml.get('title', 'Title Not Found')
31
+
32
+ # Extract content (description)
33
+ description_md = ''.join(lines[content_start:])
34
+ description = markdown.markdown(description_md)
35
+
36
+ return title, description
37
 
38
+ # Path to the markdown file
39
+ md_file_path = 'content/index.md'
 
 
 
 
 
40
 
41
+ # Extract title and description from the markdown file
42
+ title, description = extract_title_description(md_file_path)
 
 
 
 
 
 
 
43
 
44
+ # Rest of the script continues as before
45
  model_name = 'ucaslcl/GOT-OCR2_0'
46
 
 
47
  tokenizer = AutoTokenizer.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True)
48
+ config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
49
  model = AutoModel.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True, low_cpu_mem_usage=True, device_map='cuda', use_safetensors=True, pad_token_id=tokenizer.eos_token_id)
50
  model = model.eval().cuda()
51
  model.config.pad_token_id = tokenizer.eos_token_id
52
 
53
+ def image_to_base64(image):
54
+ buffered = io.BytesIO()
55
+ image.save(buffered, format="PNG")
56
+ return base64.b64encode(buffered.getvalue()).decode()
57
 
58
  @spaces.GPU
59
+ def process_image(image, task, ocr_type=None, ocr_box=None, ocr_color=None, render=False):
60
+ if task == "Plain Text OCR":
61
+ res = model.chat(tokenizer, image, ocr_type='ocr')
62
+ elif task == "Format Text OCR":
63
+ res = model.chat(tokenizer, image, ocr_type='format')
64
+ elif task == "Fine-grained OCR (Box)":
65
+ res = model.chat(tokenizer, image, ocr_type=ocr_type, ocr_box=ocr_box)
66
+ elif task == "Fine-grained OCR (Color)":
67
+ res = model.chat(tokenizer, image, ocr_type=ocr_type, ocr_color=ocr_color)
68
+ elif task == "Multi-crop OCR":
69
+ res = model.chat_crop(tokenizer, image_file=image)
70
+ elif task == "Render Formatted OCR":
71
+ res = model.chat(tokenizer, image, ocr_type='format', render=True, save_render_file='./demo.html')
72
+ with open('./demo.html', 'r') as f:
73
+ html_content = f.read()
74
+ return res, html_content
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
 
76
+ return res, None
77
+
78
  def update_inputs(task):
79
  if task == "Plain Text OCR" or task == "Format Text OCR" or task == "Multi-crop OCR":
80
  return [gr.update(visible=False)] * 4
 
94
  ]
95
  elif task == "Render Formatted OCR":
96
  return [gr.update(visible=False)] * 3 + [gr.update(visible=True)]
 
97
 
98
  def ocr_demo(image, task, ocr_type, ocr_box, ocr_color):
99
+ res, html_content = process_image(image, task, ocr_type, ocr_box, ocr_color)
100
+ if html_content:
101
+ return res, html_content
102
+ return res, None
103
+
104
+ import gradio as gr
105
 
106
  with gr.Blocks() as demo:
 
 
107
  with gr.Row():
108
+ # Left Column: Description
109
+ with gr.Column(scale=1):
110
+ gr.Markdown(f"# {title}")
111
+ gr.Markdown(description)
112
+
113
+ # Right Column: App Inputs and Outputs
114
+ with gr.Column(scale=3):
115
+ image_input = gr.Image(type="filepath", label="Input Image")
116
  task_dropdown = gr.Dropdown(
117
  choices=[
118
  "Plain Text OCR",
 
145
  visible=False
146
  )
147
  submit_button = gr.Button("Process")
148
+
149
+ # OCR Result below the Submit button
150
  output_text = gr.Textbox(label="OCR Result")
151
  output_html = gr.HTML(label="Rendered HTML Output")
152
 
153
+ # Update inputs dynamically based on task selection
 
 
 
 
 
 
 
 
 
154
  task_dropdown.change(
155
  update_inputs,
156
  inputs=[task_dropdown],
157
  outputs=[ocr_type_dropdown, ocr_box_input, ocr_color_dropdown, render_checkbox]
158
  )
159
 
160
+ # Process OCR on button click
161
  submit_button.click(
162
  ocr_demo,
163
  inputs=[image_input, task_dropdown, ocr_type_dropdown, ocr_box_input, ocr_color_dropdown],
 
165
  )
166
 
167
  if __name__ == "__main__":
168
+ demo.launch()
content/index.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: "🙋🏻‍♂️Welcome to Tonic's🫴🏻📸GOT-OCR"
3
+ ---
4
+
5
+ # GOT-OCR Model Overview
6
+
7
+ The **GOT-OCR model** is a cutting-edge OCR system with **580M parameters**, designed to process a wide range of "characters." Equipped with a **high-compression encoder** and a **long-context decoder**, it excels in both scene and document-style images. The model supports **multi-page** and **dynamic resolution OCR**, enhancing its versatility.
8
+
9
+ ### Output Formats
10
+
11
+ The model can generate results in several formats:
12
+
13
+ - **Plain Text**
14
+ - **Markdown**
15
+ - **TikZ diagrams**
16
+ - **Molecular SMILES strings**
17
+
18
+ Additionally, **interactive OCR** enables users to define regions of interest via **coordinates** or **colors**.
19
+
20
+ ## Key Features
21
+
22
+ - **Plain Text OCR**: Extracts text from images.
23
+ - **Formatted Text OCR**: Retains the original formatting, including tables and formulas.
24
+ - **Fine-grained OCR**: Offers box-based and color-based OCR for precision in specific regions.
25
+ - **Multi-crop OCR**: Handles multiple cropped sections within an image.
26
+ - **Rendered Formatted OCR**: Outputs in markdown, TikZ, SMILES, and more, with rendered formatting.
27
+
28
+ ## Supported Content Types
29
+
30
+ - Plain text
31
+ - Math/molecular formulas
32
+ - Tables and charts
33
+ - Sheet music
34
+ - Geometric shapes
35
+
36
+ ## How to Use
37
+
38
+ 1. Select a task from the dropdown menu.
39
+ 2. Upload an image.
40
+ 3. (Optional) Adjust parameters based on the selected task.
41
+ 4. Click **Process** to view the results.
42
+
43
+ ### Model Information
44
+
45
+ - **Model Name**: GOT-OCR 2.0
46
+ - **Hugging Face Repository**: [ucaslcl/GOT-OCR2_0](https://huggingface.co/ucaslcl/GOT-OCR2_0)
47
+ - **Environment**: CUDA 11.8 + PyTorch 2.0.1
48
+
49
+ ---
50
+
51
+ ### Join us :
52
+
53
+ 🌟TeamTonic🌟 is always making cool demos! Join our active builder's 🛠️community 👻 [![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP) On 🤗Huggingface:[MultiTransformer](https://huggingface.co/MultiTransformer) On 🌐Github: [Tonic-AI](https://github.com/tonic-ai) & contribute to🌟 [Build Tonic](https://git.tonic-ai.com/contribute)🤗Big thanks to Yuvi Sharma and all the folks at huggingface for the community grant 🤗