返回模型

说明文档

DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary

模型描述

该模型在来自4个NLI数据集的782,357个假设-前提对上进行训练：MultiNLI、Fever-NLI、LingNLI 和 ANLI。

请注意，该模型在二分类NLI上进行训练，预测"entailment"（蕴含）或"not-entailment"（非蕴含）。这是专门为零样本分类设计的，因为"neutral"（中性）和"contradction"（矛盾）之间的区别并不重要。

基础模型是来自Microsoft的DeBERTa-v3-xsmall。DeBERTa的v3版本通过包含不同的预训练目标大幅超越了之前的版本，详见DeBERTa-V3论文。

为了获得最高性能（但速度较慢），我推荐使用 https://huggingface.co/MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli。

预期用途与限制

如何使用模型

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

model_name = "MoritzLaurer/DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing."
hypothesis = "The movie was good."

input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")
output = model(input["input_ids"].to(device))  # device = "cuda:0" or "cpu"
prediction = torch.softmax(output["logits"][0], -1).tolist()
label_names = ["entailment", "not_entailment"]
prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)}
print(prediction)

训练数据

该模型在来自4个NLI数据集的782,357个假设-前提对上进行训练：MultiNLI、Fever-NLI、LingNLI 和 ANLI。

训练过程

DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary使用Hugging Face Trainer进行训练，超参数如下：

training_args = TrainingArguments(
    num_train_epochs=5,              # 总训练轮数
    learning_rate=2e-05,
    per_device_train_batch_size=32,   # 训练时每个设备的批次大小
    per_device_eval_batch_size=32,    # 评估时的批次大小
    warmup_ratio=0.1,                # 学习率调度器的预热步数
    weight_decay=0.06,               # 权重衰减强度
    fp16=True                        # 混合精度训练
)

评估结果

该模型使用MultiNLI、ANLI、LingNLI的二分类测试集和Fever-NLI的二分类开发集进行评估（两个类别而非三个）。使用的指标是准确率。

数据集	mnli-m-2c	mnli-mm-2c	fever-nli-2c	anli-all-2c	anli-r3-2c	lingnli-2c
准确率	0.925	0.922	0.892	0.676	0.665	0.888
速度 (文本/秒, CPU, 128批次)	6.0	6.3	3.0	5.8	5.0	7.6
速度 (文本/秒, GPU Tesla P100, 128批次)	473	487	230	390	340	586

局限性与偏见

请参阅原始DeBERTa论文和关于不同NLI数据集的文献以了解潜在的偏见问题。

引用

如果使用此模型，请引用：Laurer, Moritz, Wouter van Atteveldt, Andreu Salleras Casas, and Kasper Welbers. 2022. 'Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT - NLI'. Preprint, June. Open Science Framework. https://osf.io/74b8k.

合作想法或问题？

如果您有问题或合作想法，请通过 m{dot}laurer{at}vu{dot}nl 或 LinkedIn联系我。

调试与问题

请注意，DeBERTa-v3于2021年12月6日发布，旧版本的HF Transformers在运行该模型时似乎存在问题（例如导致tokenizer问题）。使用Transformers>=4.13可能会解决一些问题。

MoritzLaurer/DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary

作者 MoritzLaurer

zero-shot-classification transformers

↓ 62.1K ♥ 6

创建时间: 2022-03-02 23:29:04+00:00

更新时间: 2024-04-11 13:48:16+00:00

在 Hugging Face 上查看

文件 (12)

.gitattributes

README.md

added_tokens.json

config.json

model.safetensors

onnx/model.onnx ONNX

onnx/model_quantized.onnx ONNX

pytorch_model.bin

special_tokens_map.json

spm.model

tokenizer.json

tokenizer_config.json