返回模型
说明文档
EfficientNet-B0 文档图像分类器
这是一个基于 Google EfficientNet-B0 的图像分类模型,经过微调后可将输入图像分类为以下 26 个类别之一:
- logo(标志)
- photograph(照片)
- icon(图标)
- engineering_drawing(工程图)
- line_chart(折线图)
- bar_chart(柱状图)
- other(其他)
- table(表格)
- flow_chart(流程图)
- screenshot_from_computer(电脑截图)
- signature(签名)
- screenshot_from_manual(手册截图)
- geographical_map(地理地图)
- pie_chart(饼图)
- page_thumbnail(页面缩略图)
- stamp(印章)
- music(乐谱)
- calendar(日历)
- qr_code(二维码)
- bar_code(条形码)
- full_page_image(整页图像)
- scatter_plot(散点图)
- chemistry_structure(化学结构式)
- topographical_map(地形图)
- crossword_puzzle(填字游戏)
- box_plot(箱线图)
如何使用 - Transformers
使用 transformers 将图像分类为 26 个类别之一的示例:
import torch
import torchvision.transforms as transforms
from transformers import EfficientNetForImageClassification
from PIL import Image
import requests
urls = [
'http://images.cocodataset.org/val2017/000000039769.jpg',
'http://images.cocodataset.org/test-stuff2017/000000001750.jpg',
'http://images.cocodataset.org/test-stuff2017/000000000001.jpg'
]
image_processor = transforms.Compose(
[
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.47853944, 0.4732864, 0.47434163],
),
]
)
images = []
for url in urls:
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
image = image_processor(image)
images.append(image)
model_id = 'docling-project/DocumentFigureClassifier-v2.0'
model = EfficientNetForImageClassification.from_pretrained(model_id)
labels = model.config.id2label
device = torch.device("cpu")
torch_images = torch.stack(images).to(device)
with torch.no_grad():
logits = model(torch_images).logits # (batch_size, num_classes)
probs_batch = logits.softmax(dim=1) # (batch_size, num_classes)
probs_batch = probs_batch.cpu().numpy().tolist()
for idx, probs_image in enumerate(probs_batch):
preds = [(labels[i], prob) for i, prob in enumerate(probs_image)]
preds.sort(key=lambda t: t[1], reverse=True)
print(f"{idx}: {preds}")
如何使用 - ONNX
使用 onnx runtime 将图像分类为 26 个类别之一的示例:
import onnxruntime
import numpy as np
import torchvision.transforms as transforms
from PIL import Image
import requests
LABELS = [
"logo",
"photograph",
"icon",
"engineering_drawing",
"line_chart",
"bar_chart",
"other",
"table",
"flow_chart",
"screenshot_from_computer",
"signature",
"screenshot_from_manual",
"geographical_map",
"pie_chart",
"page_thumbnail",
"stamp",
"music",
"calendar",
"qr_code",
"bar_code",
"full_page_image",
"scatter_plot",
"chemistry_structure",
"topographical_map",
"crossword_puzzle",
"box_plot"
]
urls = [
'http://images.cocodataset.org/val2017/000000039769.jpg',
'http://images.cocodataset.org/test-stuff2017/000000001750.jpg',
'http://images.cocodataset.org/test-stuff2017/000000000001.jpg'
]
images = []
for url in urls:
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
images.append(image)
image_processor = transforms.Compose(
[
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.47853944, 0.4732864, 0.47434163],
),
]
)
processed_images_onnx = [image_processor(image).unsqueeze(0) for image in images]
# onnx needs numpy as input
onnx_inputs = [item.numpy(force=True) for item in processed_images_onnx]
# pack into a batch
onnx_inputs = np.concatenate(onnx_inputs, axis=0)
ort_session = onnxruntime.InferenceSession(
"./DocumentFigureClassifier-v2_0-onnx/model.onnx",
providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
)
for item in ort_session.run(None, {'input': onnx_inputs}):
for x in iter(item):
pred = x.argmax()
print(LABELS[pred])
引用
如果您在工作中使用此模型,请引用以下论文:
@article{Tan2019EfficientNetRM,
title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
author={Mingxing Tan and Quoc V. Le},
journal={ArXiv},
year={2019},
volume={abs/1905.11946}
}
@techreport{Docling,
author = {Deep Search Team},
month = {8},
title = {{Docling Technical Report}},
url={https://arxiv.org/abs/2408.09869},
eprint={2408.09869},
doi = "10.48550/arXiv.2408.09869",
version = {1.0.0},
year = {2024}
}
docling-project/DocumentFigureClassifier-v2.0
作者 docling-project
↓ 31.6K
♥ 0
创建时间: 2025-12-08 15:22:22+00:00
更新时间: 2026-01-16 16:34:32+00:00
在 Hugging Face 上查看文件 (6)
.gitattributes
README.md
config.json
model.onnx
ONNX
model.safetensors
preprocessor_config.json