说明文档

Docling Models ONNX - JPQD 量化版

本仓库包含经过 JPQD（联合剪枝、量化和蒸馏）量化优化的 Docling TableFormer 模型的 ONNX 版本，用于高效推理。

📋 模型概述

这些模型为 PDF 文档转换包 Docling 提供支持。TableFormer 模型能够以业界领先的准确率从图像中识别表格结构。

可用模型

模型	原始大小	优化后大小	压缩比	描述
`ds4sd_docling_models_tableformer_accurate_jpqd.onnx`	~1MB	~1MB	-	高精度表格结构识别
`ds4sd_docling_models_tableformer_fast_jpqd.onnx`	~1MB	~1MB	-	快速表格结构识别

仓库总大小：约 2MB（已针对部署进行优化）

🚀 快速开始

安装

pip install onnxruntime opencv-python numpy pillow torch torchvision

基本用法

import onnxruntime as ort
import numpy as np
from PIL import Image
import cv2

# 加载 TableFormer 模型
model_path = "ds4sd_docling_models_tableformer_accurate_jpqd.onnx"  # 或使用 fast 变体
session = ort.InferenceSession(model_path)

def preprocess_table_image(image_path):
    """为 TableFormer 模型预处理表格图像"""
    # 加载图像
    image = Image.open(image_path).convert('RGB')
    image_array = np.array(image)
    
    # TableFormer 通常需要特定的预处理
    # 这是一个简化示例 - 实际预处理可能有所不同
    
    # 调整大小并归一化（根据模型要求进行调整）
    processed = cv2.resize(image_array, (224, 224))  # 示例尺寸
    processed = processed.astype(np.float32) / 255.0
    
    # 添加批次维度并在需要时转置
    processed = np.expand_dims(processed, axis=0)
    processed = np.transpose(processed, (0, 3, 1, 2))  # 如需要，从 NHWC 转为 NCHW
    
    return processed

def recognize_table_structure(image_path, model_session):
    """使用 TableFormer 识别表格结构"""
    
    # 预处理图像
    input_tensor = preprocess_table_image(image_path)
    
    # 获取模型输入名称
    input_name = model_session.get_inputs()[0].name
    
    # 运行推理
    outputs = model_session.run(None, {input_name: input_tensor})
    
    return outputs

# 示例用法
table_image_path = "table_image.jpg"
results = recognize_table_structure(table_image_path, session)
print("表格结构识别完成！")

与 Docling 集成的高级用法

import onnxruntime as ort
from typing import Dict, Any
import numpy as np

class TableFormerONNX:
    """TableFormer 模型的 ONNX 封装器"""
    
    def __init__(self, model_path: str, model_type: str = "accurate"):
        """
        初始化 TableFormer ONNX 模型
        
        参数:
            model_path: ONNX 模型文件路径
            model_type: "accurate" 或 "fast"
        """
        self.session = ort.InferenceSession(model_path)
        self.model_type = model_type
        
        # 获取模型输入/输出信息
        self.input_name = self.session.get_inputs()[0].name
        self.input_shape = self.session.get_inputs()[0].shape
        self.output_names = [output.name for output in self.session.get_outputs()]
        
        print(f"已加载 {model_type} TableFormer 模型")
        print(f"输入形状: {self.input_shape}")
        print(f"输出名称: {self.output_names}")
    
    def preprocess(self, image: np.ndarray) -> np.ndarray:
        """为 TableFormer 推理预处理图像"""
        
        # 实现 TableFormer 特定的预处理
        # 这应与训练时使用的预处理相匹配
        
        # 示例预处理（根据实际要求调整）:
        if len(image.shape) == 3 and image.shape[2] == 3:
            # RGB 图像
            processed = cv2.resize(image, (224, 224))  # 根据需要调整大小
            processed = processed.astype(np.float32) / 255.0
            processed = np.transpose(processed, (2, 0, 1))  # HWC 转为 CHW
            processed = np.expand_dims(processed, axis=0)  # 添加批次维度
        else:
            raise ValueError("需要形状为 (H, W, 3) 的 RGB 图像")
        
        return processed
    
    def predict(self, image: np.ndarray) -> Dict[str, Any]:
        """运行表格结构预测"""
        
        # 预处理图像
        input_tensor = self.preprocess(image)
        
        # 运行推理
        outputs = self.session.run(None, {self.input_name: input_tensor})
        
        # 处理输出
        result = {}
        for i, name in enumerate(self.output_names):
            result[name] = outputs[i]
        
        return result
    
    def extract_table_structure(self, image: np.ndarray) -> Dict[str, Any]:
        """从图像中提取表格结构"""
        
        # 获取原始预测
        raw_outputs = self.predict(image)
        
        # 后处理以提取表格结构
        # 这将包括：
        # - 单元格检测和分类
        # - 行/列结构识别
        # - 表格边界检测
        
        # 简化的示例结构
        table_structure = {
            "cells": [],  # 单元格坐标和类型列表
            "rows": [],   # 行定义
            "columns": [], # 列定义
            "confidence": 0.0,
            "model_type": self.model_type
        }
        
        # TODO: 实现实际的后处理逻辑
        # 这取决于 TableFormer 的特定输出格式
        
        return table_structure

# 使用示例
def process_document_tables(image_paths, model_type="accurate"):
    """处理多个表格图像"""
    
    model_path = f"ds4sd_docling_models_tableformer_{model_type}_jpqd.onnx"
    tableformer = TableFormerONNX(model_path, model_type)
    
    results = []
    for image_path in image_paths:
        # 加载图像
        image = cv2.imread(image_path)
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        
        # 提取表格结构
        structure = tableformer.extract_table_structure(image_rgb)
        results.append({
            "image_path": image_path,
            "structure": structure
        })
        
        print(f"已处理: {image_path}")
    
    return results

# 示例用法
table_images = ["table1.jpg", "table2.jpg"]
results = process_document_tables(table_images, model_type="fast")

🔧 模型详情

TableFormer 架构

基础模型：TableFormer（基于 Transformer 的表格结构识别）
论文：TableFormer: Table Structure Understanding With Transformers
输入：表格区域图像
输出：表格结构信息（单元格、行、列）

模型变体

精确模型 (`tableformer_accurate`)

用例：高精度表格结构识别
权衡：更高的准确率，推理速度稍慢
推荐用于：需要最高精度的生产场景

快速模型 (`tableformer_fast`)

用例：实时表格结构识别
权衡：良好的准确率，更快的推理速度
推荐用于：交互式应用、批量处理

性能基准测试

TableFormer 在表格结构识别方面达到了业界领先的性能：

模型 (TEDS 分数)	简单表格	复杂表格	所有表格
Tabula	78.0	57.8	67.9
Traprange	60.8	49.9	55.4
Camelot	80.0	66.0	73.0
Acrobat Pro	68.9	61.8	65.3
EDD	91.2	85.4	88.3
TableFormer	95.4	90.1	93.6

优化详情

方法：JPQD（联合剪枝、量化和蒸馏）
精度：INT8 权重，FP32 激活
框架：ONNXRuntime 动态量化
性能：针对 CPU 推理进行了优化

📚 与 Docling 集成

这些模型旨在与 Docling 文档转换管道无缝协作：

# 与 Docling 集成的示例
from docling import DocumentConverter

# 配置转换器使用 ONNX 模型
converter_config = {
    "table_structure_model": "ds4sd_docling_models_tableformer_accurate_jpqd.onnx",
    "use_onnx_runtime": True
}

converter = DocumentConverter(config=converter_config)

# 使用优化模型转换文档
result = converter.convert("document.pdf")

🎯 使用案例

文档处理管道

PDF 表格提取和转换
学术论文处理
财务文档分析
法律文档数字化

商业应用

发票处理和数据提取
报告分析和摘要
表单处理和数字化
合同分析

研究应用

文档布局分析研究
表格理解基准测试
多模态文档 AI 系统
信息提取管道

⚡ 性能与部署

运行时要求

CPU：针对 CPU 推理进行了优化
内存：推理期间每个模型约 50MB
依赖项：ONNXRuntime、OpenCV、NumPy

部署选项

边缘部署：轻量级模型，适合边缘设备
云服务：易于与云 ML 管道集成
移动应用：针对移动部署进行了优化
批量处理：适用于大规模文档处理

📄 模型信息

原始仓库

来源：DS4SD/docling
原始模型：可在 HuggingFace Hub 上获取
许可证：CDLA Permissive 2.0

优化过程

模型提取：从原始 Docling 模型转换
ONNX 转换：PyTorch → ONNX 并进行优化
JPQD 量化：应用动态量化
验证：验证输出兼容性和性能

技术规格

框架：ONNX Runtime
输入格式：RGB 图像（表格区域）
输出格式：结构化表格信息
批次支持：支持动态批处理
硬件：CPU 优化（兼容 GPU）

🔄 模型版本

版本	日期	模型	变更
v1.0	2025-01	TableFormer Accurate/Fast	初始 JPQD 量化发布

📄 许可证与引用

许可证

模型：CDLA Permissive 2.0（继承自 Docling）
代码示例：Apache 2.0
文档：CC BY 4.0

引用

如果您在研究中使用这些模型，请引用：

@techreport{Docling,
  author = {Deep Search Team},
  month = {8},
  title = {{Docling Technical Report}},
  url={https://arxiv.org/abs/2408.09869},
  eprint={2408.09869},
  doi = "10.48550/arXiv.2408.09869",
  version = {1.0.0},
  year = {2024}
}

@InProceedings{TableFormer2022,
    author    = {Nassar, Ahmed and Livathinos, Nikolaos and Lysak, Maksym and Staar, Peter},
    title     = {TableFormer: Table Structure Understanding With Transformers},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {4614-4623},
    doi = {https://doi.org/10.1109/CVPR52688.2022.00457}
}

🤝 贡献

欢迎贡献！改进领域包括：

增强的预处理管道
额外的后处理方法
性能优化
文档改进
集成示例

📞 支持

如有问题和支持需求：

问题：在本仓库提交 issue
Docling 文档：DS4SD/docling
社区：加入文档 AI 社区讨论

🔗 相关资源

这些模型是 Docling TableFormer 模型的优化版本，用于高效的生产部署，同时保持准确性。

asmud/ds4sd-docling-models-onnx

作者 asmud

image-to-text onnx

↓ 0 ♥ 5

创建时间: 2025-09-02 13:58:32+00:00

更新时间: 2025-09-02 21:03:49+00:00

在 Hugging Face 上查看

文件 (9)

.gitattributes

LICENSE

README.md

example.py

requirements.txt

tableformer_accurate.onnx ONNX

tableformer_accurate.yaml

tableformer_fast.onnx ONNX

tableformer_fast.yaml