说明文档

LoRA 微调 AI 生成内容检测器

免责声明

此 ONNX 模型是从 safetensors 格式的原始模型转换而来。转换的目的是为了兼容使用 ONNX 模型的框架或工具。

请注意，此仓库与原始模型的创建者无关联。模型开发的所有功劳归原始作者所有。如需访问原始模型，请访问：原始模型链接。

如果您对原始模型、其许可或使用有任何疑问，请参阅上面提供的源链接。

这是一个使用 LoRA 微调的 e5-small 模型，用于序列分类任务。该模型经过优化，能够高精度地将文本分类为 AI 生成或人类撰写。

Label_0：表示人类撰写的内容。
Label_1：表示 AI 生成的内容。

模型详情

基础模型：intfloat/e5-small
微调技术：LoRA（低秩自适应）
任务：序列分类
应用场景：AI 生成内容检测的文本分类。
超参数：
- 学习率：5e-5
- 训练轮数：3
- LoRA 秩：8
- LoRA alpha：16

训练详情

数据集：
- 10,000 条推文和 10,000 条使用 GPT-4o-mini 重写的推文。
- 来自 RAID-train 的 80,000 条人类撰写文本。
- 来自 RAID-train 的 128,000 条 AI 生成文本。
硬件：在单块 NVIDIA A100 GPU 上进行微调。
训练时间：约 2 小时。
评估指标：

指标 (原始) E5-small 微调后

准确率 65.2% 89.0%

F1 分数 0.653 0.887

AUC 0.697 0.976

指标	(原始) E5-small	微调后
准确率	65.2%	89.0%
F1 分数	0.653	0.887
AUC	0.697	0.976

合作者

Menglin Zhou
Jiaping Liu
Xiaotian Zhan

引用

如果您使用此模型，请按以下方式引用 RAID 数据集：

@inproceedings{dugan-etal-2024-raid,
    title = \"{RAID}: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors\",
    author = \"Dugan, Liam  and
      Hwang, Alyssa  and
      Trhl{\'\i}k, Filip  and
      Zhu, Andrew  and
      Ludan, Josh Magnus  and
      Xu, Hainiu  and
      Ippolito, Daphne  and
      Callison-Burch, Chris\",
    booktitle = \"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)\",
    month = aug,
    year = \"2024\",
    address = \"Bangkok, Thailand\",
    publisher = \"Association for Computational Linguistics\",
    url = \"https://aclanthology.org/2024.acl-long.674\",
    pages = \"12463--12492\",
}

limanup/e5-small

作者 limanup

text-classification

↓ 1 ♥ 0

创建时间: 2025-06-19 19:59:00+00:00

更新时间: 2025-07-04 16:46:00+00:00

在 Hugging Face 上查看

文件 (10)

.gitattributes

README.md

config.json

onnx/model.onnx ONNX

onnx/model_fp32_logits_only.onnx ONNX

onnx/model_quantized.onnx ONNX

special_tokens_map.json

tokenizer.json

tokenizer_config.json

vocab.txt