说明文档

LayoutLMv3

模型描述

LayoutLMv3 是一个用于文档智能（Document AI）的预训练多模态 Transformer，采用统一的文本和图像掩码训练策略。简洁统一的架构和训练目标使 LayoutLMv3 成为通用的预训练模型。例如，LayoutLMv3 可以微调用于以文本为中心的任务，包括表单理解、收据理解和文档视觉问答，也可以用于以图像为中心的任务，如文档图像分类和文档布局分析。

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei，ACM Multimedia 2022。

引用

如果您在研究中使用 LayoutLMv3 有帮助，请引用以下论文：

@inproceedings{huang2022layoutlmv3,
  author={Yupan Huang and Tengchao Lv and Lei Cui and Yutong Lu and Furu Wei},
  title={LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  year={2022}
}

许可证

本项目内容采用 Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) 许可证。部分源代码基于 transformers 项目。 Microsoft Open Source Code of Conduct

microsoft/layoutlmv3-base

作者 microsoft

transformers

↓ 855K ♥ 476

创建时间: 2022-04-18 06:53:05+00:00

更新时间: 2024-04-10 14:20:22+00:00

在 Hugging Face 上查看

文件 (11)

.gitattributes

README.md

config.json

merges.txt

model.onnx ONNX

model.safetensors

preprocessor_config.json

pytorch_model.bin

tf_model.h5

tokenizer_config.json

vocab.json