说明文档

Type-R 官方仓库

本仓库包含 Type-R 项目中使用的模型权重和数据资源。该数据集旨在支持 Type-R 系统中使用的文本到图像生成、OCR、文本擦除、编辑和评估流程。

📘 目录结构

⚠️ 仓库中的代码设计为直接使用此结构运行。

<pre>

resources/ ├── weight/ │ ├── ocr/ # OCR相关模型权重 │ │ ├── solo.pth # ⚠️需手动下载 │ │ ├── masktextspotterv3.pth # ⚠️需手动下载 │ │ ├── modelscope │ │ ├── craft │ │ ├── clova │ │ └── hisam_weight │ ├── text_eraser/ # 文本擦除模型权重 │ │ ├── big-lama.pt │ │ └── garnet.pth │ ├── text_editor/ # 文本编辑模型权重 │ │ ├── anytext.ckpt │ │ └── udifftext │ └── t2i/ # 文本到图像模型权重 │ ├── (权重将缓存到这里) │ ~ ├── data/ │ ├── marioevalbench/ # Mario-Eval 基准数据集 | │ └── hfds │ ├── arial_unicode_ms.ttf # ⚠️需手动下载 │ └── LiberationSans-Regular.ttf └── prompt └── example.txt

</pre>

📘 ⚠️需手动下载的数据⚠️

resources/weight/ocr/solo.pth
- 请从官方 Deeosolo 实现下载此权重。[链接]
- 此权重使用 ViTAEv2-S 作为骨干网络，并在 Synth150K+Total-Text+MLT17+IC13+IC15+TextOCR 上训练。
resources/weight/ocr/masktextspotterv3.pth
- 请从官方 MaskTextSpotterV3 实现下载此权重。[链接]
resources/data/arial_unicode_ms.ttf
- 由于 Arial 字体无法重新分发，请通过您的操作系统或其他合法途径获取。作为替代，您可以使用开源字体如 Liberation Sans（resources/data/LiberationSans-Regular.ttf）。但请注意，我们观察到在最佳配置下使用 AnyText 配合 Liberation Sans 时，Mario-Eval 基准上的 OCR 准确率会下降 1-2 个百分点。

📘 数据集详情

weight/
- 此目录包含 Type-R 流程中各模块使用的预训练权重
- ocr/：用于 OCR 检测/识别的模型。
- text_eraser/：用于移除文本的修复或擦除模块。
- text_editor/：用于将文本渲染到图像中的模型。
- t2i/：大型文本到图像模型。
  - 如果 T2I 模型需要认证，请确保在执行流程之前登录 Hugging Face（例如使用 huggingface-cli login）。
data/marioevalbench/
- 包含用于评估 Type-R 的提示词和参考图像的数据集
- hfds/：包含 Mario-Eval 基准的提示词、增强提示词和图像

📘 许可证

权重

DeepSolo：resources/weight/ocr/solo.pth — 许可协议为 Adelaidet
MaskTextSpotterV3：resources/weight/ocr/masktextspotterv3.pth — 许可协议为 Creative commons
Paddle：resources/weight/ocr/modelscope — 许可协议为 Apache 2.0
CRAFT：resources/weight/ocr/craft — 许可协议为 MIT License
Clova Recognition：resources/weight/ocr/clova — 许可协议为 Apache 2.0
Hi-SAM：resources/weight/ocr/hisam_weight — 许可协议为 Apache 2.0
Lama：resources/weight/text_eraser/big-lama.pt — 许可协议为 Apache 2.0
Garnet：resources/weight/text_eraser/garnet.pth — 许可协议为 Apache 2.0
AnyText：resources/weight/text_editor/anytext.ckpt — 许可协议为 Apache 2.0
UDiffText：resources/weight/text_editor/udifftext — 许可协议为 MIT License

数据

Mario-Eval Benchmark：resources/data/marioevalbench — 许可协议为 MIT License
Arial 字体：resources/data/arial_unicode_ms.ttf — 许可协议为 License Microsoft fonts
Liberation Sans：resources/data/LiberationSans-Regular.ttf — 许可协议为 OFL 1.1

cyberagent/type-r

作者 cyberagent

text-to-image

↓ 0 ♥ 2

创建时间: 2025-04-23 05:40:51+00:00

更新时间: 2025-05-20 03:48:45+00:00

在 Hugging Face 上查看

文件 (69)

.gitattributes

README.md

data/LiberationSans-Regular.ttf

data/marioevalbench/hfds/ablation/data-00000-of-00003.arrow

data/marioevalbench/hfds/ablation/data-00001-of-00003.arrow

data/marioevalbench/hfds/ablation/data-00002-of-00003.arrow

data/marioevalbench/hfds/ablation/dataset_info.json

data/marioevalbench/hfds/ablation/state.json

data/marioevalbench/hfds/dataset_dict.json

data/marioevalbench/hfds/test/data-00000-of-00010.arrow

data/marioevalbench/hfds/test/data-00001-of-00010.arrow

data/marioevalbench/hfds/test/data-00002-of-00010.arrow

data/marioevalbench/hfds/test/data-00003-of-00010.arrow

data/marioevalbench/hfds/test/data-00004-of-00010.arrow

data/marioevalbench/hfds/test/data-00005-of-00010.arrow

data/marioevalbench/hfds/test/data-00006-of-00010.arrow

data/marioevalbench/hfds/test/data-00007-of-00010.arrow

data/marioevalbench/hfds/test/data-00008-of-00010.arrow

data/marioevalbench/hfds/test/data-00009-of-00010.arrow

data/marioevalbench/hfds/test/dataset_info.json

data/marioevalbench/hfds/test/state.json

data/marioevalbench/hfds/userstudy/data-00000-of-00003.arrow

data/marioevalbench/hfds/userstudy/data-00001-of-00003.arrow

data/marioevalbench/hfds/userstudy/data-00002-of-00003.arrow

data/marioevalbench/hfds/userstudy/dataset_info.json

data/marioevalbench/hfds/userstudy/state.json

data/marioevalbench/hfds/val/data-00000-of-00003.arrow

data/marioevalbench/hfds/val/data-00001-of-00003.arrow

data/marioevalbench/hfds/val/data-00002-of-00003.arrow

data/marioevalbench/hfds/val/dataset_info.json

data/marioevalbench/hfds/val/state.json

prompt/example.txt

weight/ocr/clova/TPS-ResNet-BiLSTM-Attn.pth

weight/ocr/craft/craft_mlt_25k.pth

weight/ocr/craft/craft_refiner_CTW1500.pth

weight/ocr/hisam_weight/hi_sam_h.pth

weight/ocr/hisam_weight/put checkpoints here.txt

weight/ocr/hisam_weight/sam_vit_h_4b8939.pth

weight/ocr/modelscope/.mdl

weight/ocr/modelscope/.msc

weight/ocr/modelscope/.mv

weight/ocr/modelscope/README.md

weight/ocr/modelscope/configuration.json

weight/ocr/modelscope/model.onnx ONNX

weight/ocr/modelscope/pytorch_model.pt

weight/ocr/modelscope/quickstart.md

weight/ocr/modelscope/resources/ConvTransformer-Pipeline.jpg

weight/ocr/modelscope/resources/rec_result_measure.png

weight/ocr/modelscope/resources/rec_result_visu.jpg

weight/ocr/modelscope/vocab.txt

weight/ocr/trocr-large-str/.gitattributes

weight/ocr/trocr-large-str/README.md

weight/ocr/trocr-large-str/config.json

weight/ocr/trocr-large-str/generation_config.json

weight/ocr/trocr-large-str/merges.txt

weight/ocr/trocr-large-str/preprocessor_config.json

weight/ocr/trocr-large-str/pytorch_model.bin

weight/ocr/trocr-large-str/special_tokens_map.json

weight/ocr/trocr-large-str/tokenizer.json

weight/ocr/trocr-large-str/tokenizer_config.json

weight/ocr/trocr-large-str/vocab.json

weight/text_editor/anytext.ckpt

weight/text_editor/udifftext/AEs/AE_inpainting_2.safetensors

weight/text_editor/udifftext/encoders/LabelEncoder/epoch=19-step=7820.ckpt

weight/text_editor/udifftext/encoders/ViTSTR/vitstr_base_patch16_224.pth

weight/text_editor/udifftext/predictors/parseq-bb5792a6.pt

weight/text_editor/udifftext/udifftext.ckpt

weight/text_eraser/big-lama.pt

weight/text_eraser/garnet.pth