ONNX 模型库
返回模型

说明文档

EfficientNet-B0 文档图像分类器

这是一个基于 Google EfficientNet-B0 的图像分类模型,经过微调后可将输入图像分类为以下 26 个类别之一:

  1. logo(标志)
  2. photograph(照片)
  3. icon(图标)
  4. engineering_drawing(工程图)
  5. line_chart(折线图)
  6. bar_chart(柱状图)
  7. other(其他)
  8. table(表格)
  9. flow_chart(流程图)
  10. screenshot_from_computer(电脑截图)
  11. signature(签名)
  12. screenshot_from_manual(手册截图)
  13. geographical_map(地理地图)
  14. pie_chart(饼图)
  15. page_thumbnail(页面缩略图)
  16. stamp(印章)
  17. music(乐谱)
  18. calendar(日历)
  19. qr_code(二维码)
  20. bar_code(条形码)
  21. full_page_image(整页图像)
  22. scatter_plot(散点图)
  23. chemistry_structure(化学结构式)
  24. topographical_map(地形图)
  25. crossword_puzzle(填字游戏)
  26. box_plot(箱线图)

如何使用 - Transformers

使用 transformers 将图像分类为 26 个类别之一的示例:

import torch
import torchvision.transforms as transforms

from transformers import EfficientNetForImageClassification
from PIL import Image
import requests


urls = [
    'http://images.cocodataset.org/val2017/000000039769.jpg',
    'http://images.cocodataset.org/test-stuff2017/000000001750.jpg',
    'http://images.cocodataset.org/test-stuff2017/000000000001.jpg'
]

image_processor = transforms.Compose(
    [
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.47853944, 0.4732864, 0.47434163],
        ),
    ]
)

images = []
for url in urls:
    image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
    image = image_processor(image)
    images.append(image)


model_id = 'docling-project/DocumentFigureClassifier-v2.0'

model = EfficientNetForImageClassification.from_pretrained(model_id)

labels = model.config.id2label

device = torch.device("cpu")

torch_images = torch.stack(images).to(device)

with torch.no_grad():
    logits = model(torch_images).logits  # (batch_size, num_classes)
    probs_batch = logits.softmax(dim=1)  # (batch_size, num_classes)
    probs_batch = probs_batch.cpu().numpy().tolist()

for idx, probs_image in enumerate(probs_batch):
    preds = [(labels[i], prob) for i, prob in enumerate(probs_image)]
    preds.sort(key=lambda t: t[1], reverse=True)
    print(f"{idx}: {preds}")

如何使用 - ONNX

使用 onnx runtime 将图像分类为 26 个类别之一的示例:

import onnxruntime

import numpy as np
import torchvision.transforms as transforms

from PIL import Image
import requests

LABELS = [
    "logo",
    "photograph",
    "icon",
    "engineering_drawing",
    "line_chart",
    "bar_chart",
    "other",
    "table",
    "flow_chart",
    "screenshot_from_computer",
    "signature",
    "screenshot_from_manual",
    "geographical_map",
    "pie_chart",
    "page_thumbnail",
    "stamp",
    "music",
    "calendar",
    "qr_code",
    "bar_code",
    "full_page_image",
    "scatter_plot",
    "chemistry_structure",
    "topographical_map",
    "crossword_puzzle",
    "box_plot"
]


urls = [
    'http://images.cocodataset.org/val2017/000000039769.jpg',
    'http://images.cocodataset.org/test-stuff2017/000000001750.jpg',
    'http://images.cocodataset.org/test-stuff2017/000000000001.jpg'
]

images = []
for url in urls:
    image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
    images.append(image)


image_processor = transforms.Compose(
    [
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.47853944, 0.4732864, 0.47434163],
        ),
    ]
)


processed_images_onnx = [image_processor(image).unsqueeze(0) for image in images]

# onnx needs numpy as input
onnx_inputs = [item.numpy(force=True) for item in processed_images_onnx]

# pack into a batch
onnx_inputs = np.concatenate(onnx_inputs, axis=0)

ort_session = onnxruntime.InferenceSession(
    "./DocumentFigureClassifier-v2_0-onnx/model.onnx",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
)


for item in ort_session.run(None, {'input': onnx_inputs}):
    for x in iter(item):
        pred = x.argmax()
        print(LABELS[pred])

引用

如果您在工作中使用此模型,请引用以下论文:

@article{Tan2019EfficientNetRM,
  title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
  author={Mingxing Tan and Quoc V. Le},
  journal={ArXiv},
  year={2019},
  volume={abs/1905.11946}
}

@techreport{Docling,
  author = {Deep Search Team},
  month = {8},
  title = {{Docling Technical Report}},
  url={https://arxiv.org/abs/2408.09869},
  eprint={2408.09869},
  doi = "10.48550/arXiv.2408.09869",
  version = {1.0.0},
  year = {2024}
}

docling-project/DocumentFigureClassifier-v2.0

作者 docling-project

↓ 31.6K ♥ 0

创建时间: 2025-12-08 15:22:22+00:00

更新时间: 2026-01-16 16:34:32+00:00

在 Hugging Face 上查看

文件 (6)

.gitattributes
README.md
config.json
model.onnx ONNX
model.safetensors
preprocessor_config.json