vkuropiatnyk commited on
Commit
e3a3538
·
verified ·
1 Parent(s): 720dd0a

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +115 -42
  2. config.json +52 -78
  3. model.onnx +3 -0
  4. model.safetensors +2 -2
README.md CHANGED
@@ -7,52 +7,39 @@ base_model:
7
 
8
  # EfficientNet-B0 Document Image Classifier
9
 
10
- This is an image classification model based on **Google EfficientNet-B0**, fine-tuned to classify input images into one of the following 39 categories (to be reduced):
11
-
12
- 1. **bar_chart**
13
- 2. **bar_code**
14
- 3. **chemistry_structure**
15
- 4. **flow_chart**
16
- 5. **icon**
17
- 6. **line_chart**
18
- 7. **logo**
19
- 8. **geographical_map**
20
- 9. **topographical_map**
21
- 10. **other**
22
- 11. **pie_chart**
23
- 12. **qr_code**
24
- 13. **scatter_plot**
25
- 14. **screenshot_from_manual**
26
- 15. **screenshot_from_computer**
27
- 16. **calendar**
28
- 17. **crossword_puzzle**
29
- 18. **signature**
30
- 19. **stamp**
31
- 20. **photograph**
32
- 21. **engineering_drawing**
33
- 22. **table**
34
- 23. **full_page_image**
35
- 24. **page_thumbnail**
36
- 25. **music**
37
- 26. **illustration**
38
- 27. **treemap**
39
- 28. **radar_chart**
40
- 29. **screenshot_from_mobile**
41
- 30. **sudoku_puzzle**
42
- 31. **box_plot**
43
- 32. **cryptoquote**
44
- 33. **heatmap**
45
- 34. **poster**
46
- 35. **passport**
47
- 36. **legend**
48
- 37. **area_chart**
49
- 38. **astrology_chart**
50
- 39. **book cover**
51
 
52
 
53
 
54
  ### How to use
55
- Example of how to classify an image into one of the 39 classes:
56
 
57
  ```python
58
  import torch
@@ -109,6 +96,92 @@ for idx, probs_image in enumerate(probs_batch):
109
  ```
110
 
111
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
 
113
  ## Citation
114
  If you use this model in your work, please cite the following papers:
 
7
 
8
  # EfficientNet-B0 Document Image Classifier
9
 
10
+ This is an image classification model based on **Google EfficientNet-B0**, fine-tuned to classify input images into one of the following 26 categories:
11
+
12
+ 1. **logo**
13
+ 2. **photograph**
14
+ 3. **icon**
15
+ 4. **engineering_drawing**
16
+ 5. **line_chart**
17
+ 6. **bar_chart**
18
+ 7. **other**
19
+ 8. **table**
20
+ 9. **flow_chart**
21
+ 10. **screenshot_from_computer**
22
+ 11. **signature**
23
+ 12. **screenshot_from_manual**
24
+ 13. **geographical_map**
25
+ 14. **pie_chart**
26
+ 15. **page_thumbnail**
27
+ 16. **stamp**
28
+ 17. **music**
29
+ 18. **calendar**
30
+ 19. **qr_code**
31
+ 20. **bar_code**
32
+ 21. **full_page_image**
33
+ 22. **scatter_plot**
34
+ 23. **chemistry_structure**
35
+ 24. **topographical_map**
36
+ 25. **crossword_puzzle**
37
+ 26. **box_plot**
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
 
40
 
41
  ### How to use
42
+ Example of how to classify an image into one of the 39 classes using transformers:
43
 
44
  ```python
45
  import torch
 
96
  ```
97
 
98
 
99
+ Example of how to classify an image into one of the 39 classes using onnx runtime:
100
+
101
+ ```python
102
+ import onnxruntime
103
+
104
+ import numpy as np
105
+ import torchvision.transforms as transforms
106
+
107
+ from PIL import Image
108
+ import requests
109
+
110
+ LABELS = [
111
+ "logo",
112
+ "photograph",
113
+ "icon",
114
+ "engineering_drawing",
115
+ "line_chart",
116
+ "bar_chart",
117
+ "other",
118
+ "table",
119
+ "flow_chart",
120
+ "screenshot_from_computer",
121
+ "signature",
122
+ "screenshot_from_manual",
123
+ "geographical_map",
124
+ "pie_chart",
125
+ "page_thumbnail",
126
+ "stamp",
127
+ "music",
128
+ "calendar",
129
+ "qr_code",
130
+ "bar_code",
131
+ "full_page_image",
132
+ "scatter_plot",
133
+ "chemistry_structure",
134
+ "topographical_map",
135
+ "crossword_puzzle",
136
+ "box_plot"
137
+ ]
138
+
139
+
140
+ urls = [
141
+ 'http://images.cocodataset.org/val2017/000000039769.jpg',
142
+ 'http://images.cocodataset.org/test-stuff2017/000000001750.jpg',
143
+ 'http://images.cocodataset.org/test-stuff2017/000000000001.jpg'
144
+ ]
145
+
146
+ images = []
147
+ for url in urls:
148
+ image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
149
+ images.append(image)
150
+
151
+
152
+ image_processor = transforms.Compose(
153
+ [
154
+ transforms.Resize((224, 224)),
155
+ transforms.ToTensor(),
156
+ transforms.Normalize(
157
+ mean=[0.485, 0.456, 0.406],
158
+ std=[0.47853944, 0.4732864, 0.47434163],
159
+ ),
160
+ ]
161
+ )
162
+
163
+
164
+ processed_images_onnx = [image_processor(image).unsqueeze(0) for image in images]
165
+
166
+ # onnx needs numpy as input
167
+ onnx_inputs = [item.numpy(force=True) for item in processed_images_onnx]
168
+
169
+ # pack into a batch
170
+ onnx_inputs = np.concatenate(onnx_inputs, axis=0)
171
+
172
+ ort_session = onnxruntime.InferenceSession(
173
+ "./DocumentFigureClassifier-v2_0-onnx/model.onnx",
174
+ providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
175
+ )
176
+
177
+
178
+ for item in ort_session.run(None, {'input': onnx_inputs}):
179
+ for x in iter(item):
180
+ pred = x.argmax()
181
+ print(LABELS[pred])
182
+ ```
183
+
184
+
185
 
186
  ## Citation
187
  If you use this model in your work, please cite the following papers:
config.json CHANGED
@@ -22,45 +22,32 @@
22
  "hidden_act": "swish",
23
  "hidden_dim": 1280,
24
  "id2label": {
25
- "0": "bar_chart",
26
- "1": "bar_code",
27
- "10": "pie_chart",
28
- "11": "qr_code",
29
- "12": "scatter_plot",
30
- "13": "screenshot_from_manual",
31
- "14": "screenshot_from_computer",
32
- "15": "calendar",
33
- "16": "crossword_puzzle",
34
- "17": "signature",
35
- "18": "stamp",
36
- "19": "photograph",
37
- "2": "chemistry_structure",
38
- "20": "engineering_drawing",
39
- "21": "table",
40
- "22": "full_page_image",
41
- "23": "page_thumbnail",
42
- "24": "music",
43
- "25": "illustration",
44
- "26": "treemap",
45
- "27": "radar_chart",
46
- "28": "screenshot_from_mobile",
47
- "29": "sudoku_puzzle",
48
- "3": "flow_chart",
49
- "30": "box_plot",
50
- "31": "cryptoquote",
51
- "32": "heatmap",
52
- "33": "poster",
53
- "34": "passport",
54
- "35": "legend",
55
- "36": "area_chart",
56
- "37": "astrology_chart",
57
- "38": "book cover",
58
- "4": "icon",
59
- "5": "line_chart",
60
- "6": "logo",
61
- "7": "geographical_map",
62
- "8": "topographical_map",
63
- "9": "other"
64
  },
65
  "image_size": 224,
66
  "in_channels": [
@@ -83,45 +70,32 @@
83
  3
84
  ],
85
  "label2id": {
86
- "area_chart": "36",
87
- "astrology_chart": "37",
88
- "bar_chart": "0",
89
- "bar_code": "1",
90
- "book cover": "38",
91
- "box_plot": "30",
92
- "calendar": "15",
93
- "chemistry_structure": "2",
94
- "crossword_puzzle": "16",
95
- "cryptoquote": "31",
96
- "engineering_drawing": "20",
97
- "flow_chart": "3",
98
- "full_page_image": "22",
99
- "geographical_map": "7",
100
- "heatmap": "32",
101
- "icon": "4",
102
- "illustration": "25",
103
- "legend": "35",
104
- "line_chart": "5",
105
- "logo": "6",
106
- "music": "24",
107
- "other": "9",
108
- "page_thumbnail": "23",
109
- "passport": "34",
110
- "photograph": "19",
111
- "pie_chart": "10",
112
- "poster": "33",
113
- "qr_code": "11",
114
- "radar_chart": "27",
115
- "scatter_plot": "12",
116
- "screenshot_from_computer": "14",
117
- "screenshot_from_manual": "13",
118
- "screenshot_from_mobile": "28",
119
- "signature": "17",
120
- "stamp": "18",
121
- "sudoku_puzzle": "29",
122
- "table": "21",
123
- "topographical_map": "8",
124
- "treemap": "26"
125
  },
126
  "model_type": "efficientnet",
127
  "num_block_repeats": [
 
22
  "hidden_act": "swish",
23
  "hidden_dim": 1280,
24
  "id2label": {
25
+ "0": "logo",
26
+ "1": "photograph",
27
+ "10": "signature",
28
+ "11": "screenshot_from_manual",
29
+ "12": "geographical_map",
30
+ "13": "pie_chart",
31
+ "14": "page_thumbnail",
32
+ "15": "stamp",
33
+ "16": "music",
34
+ "17": "calendar",
35
+ "18": "qr_code",
36
+ "19": "bar_code",
37
+ "2": "icon",
38
+ "20": "full_page_image",
39
+ "21": "scatter_plot",
40
+ "22": "chemistry_structure",
41
+ "23": "topographical_map",
42
+ "24": "crossword_puzzle",
43
+ "25": "box_plot",
44
+ "3": "engineering_drawing",
45
+ "4": "line_chart",
46
+ "5": "bar_chart",
47
+ "6": "other",
48
+ "7": "table",
49
+ "8": "flow_chart",
50
+ "9": "screenshot_from_computer"
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  },
52
  "image_size": 224,
53
  "in_channels": [
 
70
  3
71
  ],
72
  "label2id": {
73
+ "bar_chart": "5",
74
+ "bar_code": "19",
75
+ "box_plot": "25",
76
+ "calendar": "17",
77
+ "chemistry_structure": "22",
78
+ "crossword_puzzle": "24",
79
+ "engineering_drawing": "3",
80
+ "flow_chart": "8",
81
+ "full_page_image": "20",
82
+ "geographical_map": "12",
83
+ "icon": "2",
84
+ "line_chart": "4",
85
+ "logo": "0",
86
+ "music": "16",
87
+ "other": "6",
88
+ "page_thumbnail": "14",
89
+ "photograph": "1",
90
+ "pie_chart": "13",
91
+ "qr_code": "18",
92
+ "scatter_plot": "21",
93
+ "screenshot_from_computer": "9",
94
+ "screenshot_from_manual": "11",
95
+ "signature": "10",
96
+ "stamp": "15",
97
+ "table": "7",
98
+ "topographical_map": "23"
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  },
100
  "model_type": "efficientnet",
101
  "num_block_repeats": [
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:acba68df0a2f149212f5b5082d98a81700c93280e39a73dca095040ef19a583f
3
+ size 16763657
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:441ff87d71573c0aea1f8d00537ae8b2c88baf4885674677f410de08db2bd547
3
- size 16444820
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8232c0c1e4a25551e496ccaf548e469e321f78997d18c9be7f3af9ccb5d222b
3
+ size 16378200