返回模型

说明文档

This is a direct translation task - no exploration or codebase search needed. I'll translate the README to Chinese (Simplified) while preserving all formatting.

---
language:
- fr
- en
tags:
- florence2
- ocr
- 漫画
- 视觉
- onnx
- transformers.js
license: mit
base_model: microsoft/Florence-2-base
pipeline_tag: 图像转文本
---

# Florence-2-base 微调版：海贼王 OCR 🏴‍☠️

本模型是 [microsoft/Florence-2-base](https://huggingface.co/microsoft/Florence-2-base) 的微调版本，专门针对漫画气泡的 OCR 进行优化，训练数据为**《海贼王》**法语版扫描图。该模型针对风格化文字进行了高精度优化，并集成了 ONNX 以实现无缝的浏览器端运行。

## 🚀 核心特性
- **专业 OCR**：针对漫画字体、对话气泡和复杂背景进行训练。
- **Transformers.js 就绪**：包含针对 WebGPU 和 WASM 优化的专用 ONNX 权重。
- **高召回率**：微调了 7 个 epoch，专注于捕捉密集动作场景中的每一个词。

## 📊 评估结果
该模型在 150 个《海贼王》漫画格的测试集上与基础模型 `Florence-2-base` 进行了对比评估。本版本采用**全量微调 (FFT)**。

| 指标 | 基础模型 | **微调版 (FFT)** | 总提升 |
| :--- | :--- | :--- | :--- |
| **CER**（字符错误率）| 78.77% | **3.13%** | **+75.64 个百分点** |
| **WER**（词错误率）| 99.57% | **22.34%** | **+77.23 个百分点** |

### 为什么升级到全量微调？
虽然 LoRA 是个不错的起点，但**全量微调**允许模型的视觉编码器专门适应《海贼王》的字体风格。这显著提升了模型的鲁棒性，在对话气泡上的准确率接近完美。

## 🛠️ 使用方法

```python
from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image

model_id = "Remidesbois/florence2-onepiece-ocr"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

image = Image.open("manga_panel.png").convert("RGB")
prompt = "<OCR>"

inputs = processor(text=prompt, images=image, return_tensors="pt")
generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    max_new_tokens=1024,
    num_beams=3
)

generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)

🌐 网页集成

该模型兼容 transformers.js（v3+）。包含视觉编码器和解码器的自定义 ONNX 导出。

import { Florence2ForConditionalGeneration, AutoProcessor, RawImage } from '@huggingface/transformers';

const model = await Florence2ForConditionalGeneration.from_pretrained('Remidesbois/florence2-onepiece-ocr', {
    dtype: 'fp32',
    device: 'webgpu',
});
const processor = await AutoProcessor.from_pretrained('Remidesbois/florence2-onepiece-ocr');

// 使用 '<OCR>' 任务以获得最佳效果

📝 训练细节

数据集：约 1000 个手动标注的《海贼王》法语版扫描气泡。
硬件：在 NVIDIA RTX GPU 上训练。
优化方式：LoRA 微调（8 秩适配器），已合并到基础模型中。
学习率：5e-5
优化器：AdamW

用 ❤️ 为《海贼王》社区打造。

Remidesbois/florence2-onepiece-ocr

作者 Remidesbois

image-to-text transformers.js

↓ 18 ♥ 0

创建时间: 2026-01-26 22:47:10+00:00

更新时间: 2026-01-28 00:10:53+00:00

在 Hugging Face 上查看

文件 (22)

.gitattributes

README.md

__init__.py

added_tokens.json

config.json

configuration_florence2.py

generation_config.json

merges.txt

model.safetensors

modeling_florence2.py

onnx/decoder_model.onnx ONNX

onnx/decoder_model_merged.onnx ONNX

onnx/decoder_with_past_model.onnx ONNX

onnx/embed_tokens.onnx ONNX

onnx/encoder_model.onnx ONNX

onnx/vision_encoder.onnx ONNX

preprocessor_config.json

processing_florence2.py

special_tokens_map.json

tokenizer.json

tokenizer_config.json

vocab.json