说明文档

roberta-base-infringement-detect

模型详情

模型描述

使用 klue/roberta-base 模型，用于检测两个内容之间相似度的模型。

训练

使用自行构建的 1,310 对真实相似内容，经过打乱后生成真假比例为 1:2 的数据集进行训练。

其他训练参数如下：

参数	值
`train_batch_size`	16
`num_train_epochs`	5
`weight_decay`	0.01
`learning_rate`	2e-5

使用方法

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "kms7530/roberta-base-infringement-detect"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

模型推理时应按以下格式输入：

[CLS]\
[unused0]<原始内容标题>\
[unused1]<原始内容>[SEP] \
[unused0]<测试内容标题>\
[unused1]<测试内容>[SEP]

kms7530/roberta-base-infringement-detect

作者 kms7530

text-classification transformers

↓ 1 ♥ 0

创建时间: 2024-07-03 05:30:14+00:00

更新时间: 2024-07-24 07:37:33+00:00

在 Hugging Face 上查看

文件 (11)

.gitattributes

README.md

config.json

model.safetensors

onnx/model.onnx ONNX

onnx/model_quantized.onnx ONNX

quantize_config.json

special_tokens_map.json

tokenizer.json

tokenizer_config.json

vocab.txt