说明文档

ML-Danbooru ONNX 模型

概述

本仓库提供 ML-Danbooru 图像标签模型的 ONNX 优化实现，原模型由 7eu7d7 开发。ML-Danbooru 是一个专门为动漫风格图像自动标签设计的精密深度学习系统，利用现代 Transformer 架构实现对数千个 Danbooru 风格标签的高精度分类。本仓库中的模型已转换为 ONNX 格式，以提升推理性能和跨平台兼容性。

核心架构采用 Caformer（卷积增强 Transformer）模型，该模型结合了 Transformer 的全局感受野与卷积网络的局部特征提取能力。这种混合方法使模型能够有效捕捉动漫作品中的精细细节和全局上下文信息。本仓库包含多个使用不同配置和训练轮次训练的模型变体，用户可根据具体需求选择从快速推理到更高精度的不同选项。

在性能方面，这些模型在识别常见动漫角色属性、服装物品、配饰、背景和构图元素方面展现出卓越的准确性。它们能够可靠地识别发色、瞳色、服装类型、角色姿势和场景设置等标签，相关特征的置信度通常超过 0.7-0.9。模型支持批处理，并通过智能缩放策略处理各种宽高比的图像，在保持计算效率的同时保留重要的视觉信息。

使用方法

本仓库中的模型设计用于与 dghs-imgutils 库配合使用，该库为图像标签任务提供了全面的接口。

安装

pip install dghs-imgutils

基本用法

from imgutils.tagging import get_mldanbooru_tags

# 使用默认设置为图像打标签
tags = get_mldanbooru_tags('your_image.jpg')
print(tags)

# 使用自定义阈值和设置
tags_custom = get_mldanbooru_tags(
    'your_image.jpg',
    threshold=0.5,
    size=448,
    keep_ratio=True,
    drop_overlap=True,
    use_real_name=False
)
print(tags_custom)

模型变体

本仓库包含多个 ML-Danbooru 模型变体：

ml_caformer_m36_dec-5-97527.onnx：采用 Caformer-M36 架构的主要模型
ml_caformer_m36_dec-3-80000.onnx：不同训练检查点的替代版本
TResnet-D-FLq_ema_2-40000.onnx：基于 TResnet 的变体
TResnet-D-FLq_ema_4-10000.onnx：轻量级 TResnet 变体
TResnet-D-FLq_ema_6-10000.onnx：额外的 TResnet 检查点
TResnet-D-FLq_ema_6-30000.onnx：扩展训练的 TResnet 变体
caformer_m36-3-80000.onnx：基础 Caformer 模型

标签信息

本仓库包含完整的标签信息：

classes.json：包含 1,527 个常见动漫属性的简化标签名称
tags.csv：包含 12,547 条记录的完整标签数据库，包括：
- 原始标签名称
- 词形变化的词根形式
- 词性分类
- 使用频率统计

性能特征

输入尺寸：默认 448x448 像素（可配置）
标签数量：12,547 个可能标签
阈值：默认 0.7（可配置）
支持的标签：角色属性、服装、配饰、背景、构图
架构：Caformer-M36 和 TResnet 变体
格式：ONNX 用于优化推理

模型架构详情

ML-Danbooru 模型采用现代基于 Transformer 的架构：

Caformer-M36：结合卷积层和 Transformer 模块，实现高效特征提取
TResnet-D：使用焦点损失优化的 Transformer 增强 ResNet 变体
ONNX 优化：模型使用优化的算子导出，可在不同硬件平台上快速推理

引用

@misc{deepghs_ml_danbooru_onnx,
  title        = {{ML-Danbooru ONNX Models: Optimized Anime Image Tagging}},
  author       = {7eu7d7 and DeepGHS Contributors},
  howpublished = {\url{https://huggingface.co/deepghs/ml-danbooru-onnx}},
  year         = {2023},
  note         = {ONNX-optimized implementations of ML-Danbooru models for efficient anime image tagging with transformer-based architectures},
  abstract     = {This repository provides ONNX-optimized implementations of the ML-Danbooru image tagging models, originally developed by 7eu7d7. ML-Danbooru is a sophisticated deep learning system specifically designed for automated tagging of anime-style images, leveraging modern transformer architectures to achieve high-precision classification across thousands of Danbooru-style tags. The models employ Caformer (Convolution-Augmented Transformer) architectures that combine the global receptive field of transformers with local feature extraction capabilities of convolutional networks, enabling effective capture of both fine-grained details and global contextual information in anime artwork.},
  keywords     = {image-classification, anime, tagging, danbooru, transformer, onnx}
}

deepghs/ml-danbooru-onnx

作者 deepghs

image-classification

↓ 0 ♥ 8

创建时间: 2023-02-11 08:02:27+00:00

更新时间: 2025-11-17 13:15:43+00:00

在 Hugging Face 上查看

文件 (11)

.gitattributes

README.md

TResnet-D-FLq_ema_2-40000.onnx ONNX

TResnet-D-FLq_ema_4-10000.onnx ONNX

TResnet-D-FLq_ema_6-10000.onnx ONNX

TResnet-D-FLq_ema_6-30000.onnx ONNX

caformer_m36-3-80000.onnx ONNX

classes.json

ml_caformer_m36_dec-3-80000.onnx ONNX

ml_caformer_m36_dec-5-97527.onnx ONNX

tags.csv