说明文档

DeBERTa-v3-large-mnli-fever-anli-ling-wanli

模型描述

该模型在 MultiNLI、Fever-NLI、对抗性NLI（ANLI）、LingNLI 和 WANLI 数据集上进行了微调，这些数据集包含 885,242 个NLI假设-前提对。该模型是截至2022年6月6日在Hugging Face Hub上性能最好的NLI模型，可用于零样本分类。它在 ANLI基准上显著优于所有其他大型模型。

基础模型是来自Microsoft的 DeBERTa-v3-large。DeBERTa-v3相较于传统的掩码语言模型（如BERT、RoBERTa等）结合了多项最新创新，详见论文。

如何使用模型

简单的零样本分类管道

from transformers import pipeline
classifier = pipeline("zero-shot-classification", model="MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli")
sequence_to_classify = "Angela Merkel is a politician in Germany and leader of the CDU"
candidate_labels = ["politics", "economy", "entertainment", "environment"]
output = classifier(sequence_to_classify, candidate_labels, multi_label=False)
print(output)

NLI用例

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

model_name = "MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing."
hypothesis = "The movie was not good."

input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")
output = model(input["input_ids"].to(device))  # device = "cuda:0" or "cpu"
prediction = torch.softmax(output["logits"][0], -1).tolist()
label_names = ["entailment", "neutral", "contradiction"]
prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)}
print(prediction)

训练数据

DeBERTa-v3-large-mnli-fever-anli-ling-wanli模型在 MultiNLI、Fever-NLI、对抗性NLI（ANLI）、LingNLI 和 WANLI 数据集上进行了训练，这些数据集包含885,242个NLI假设-前提对。请注意，SNLI 由于数据集存在质量问题被明确排除。更多的数据并不一定能带来更好的NLI模型。

训练过程

DeBERTa-v3-large-mnli-fever-anli-ling-wanli使用Hugging Face Trainer进行训练，采用以下超参数。请注意在我的测试中，更长的训练时间和更多的epoch会损害性能（过拟合）。

training_args = TrainingArguments(
    num_train_epochs=4,              # 总训练轮数
    learning_rate=5e-06,
    per_device_train_batch_size=16,   # 训练时每个设备的批次大小
    gradient_accumulation_steps=2,    # 将有效批次大小加倍至32，同时降低内存需求
    per_device_eval_batch_size=64,    # 评估时的批次大小
    warmup_ratio=0.06,                # 学习率调度器的预热步数
    weight_decay=0.01,               # 权重衰减强度
    fp16=True                        # 混合精度训练
)

评估结果

该模型使用MultiNLI、ANLI、LingNLI、WANLI的测试集以及Fever-NLI的开发集进行评估。使用的指标是准确率。该模型在每个数据集上都达到了最先进的性能。令人惊讶的是，它在ANLI上比之前的最佳模型（ALBERT-XXL）高出8.3%。我假设这是因为ANLI是为了难倒像RoBERTa（或ALBERT）这样的掩码语言模型而创建的，而DeBERTa-v3使用了更好的预训练目标（RTD）、解耦注意力，并且我在更高质量的NLI数据上进行了微调。

Datasets	mnli_test_m	mnli_test_mm	anli_test	anli_test_r3	ling_test	wanli_test
准确率	0.912	0.908	0.702	0.64	0.87	0.77
速度 (文本数/秒, A100 GPU)	696.0	697.0	488.0	425.0	828.0	980.0

局限性及偏见

有关训练数据和潜在偏见的更多信息，请参阅原始DeBERTa-v3论文及关于不同NLI数据集的文献。该模型会复现训练数据中的统计模式。

引用

如果您使用此模型，请引用：Laurer, Moritz, Wouter van Atteveldt, Andreu Salleras Casas, and Kasper Welbers. 2022. 'Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT - NLI'. Preprint, June. Open Science Framework. https://osf.io/74b8k.

合作建议或问题？

如果您有问题或合作想法，请通过 m{dot}laurer{at}vu{dot}nl 或 LinkedIn联系我。

调试和问题

请注意，DeBERTa-v3于2021年12月6日发布，旧版本的HF Transformers在运行该模型时似乎存在一些问题（例如导致分词器问题）。使用Transformers>=4.13可能会解决一些问题。

MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli

作者 MoritzLaurer

zero-shot-classification transformers

↓ 49.1K ♥ 119

创建时间: 2022-06-06 18:28:10+00:00

更新时间: 2024-04-11 13:49:10+00:00

在 Hugging Face 上查看

文件 (12)

.gitattributes

README.md

added_tokens.json

config.json

model.safetensors

onnx/model.onnx ONNX

onnx/model_quantized.onnx ONNX

pytorch_model.bin

special_tokens_map.json

spm.model

tokenizer.json

tokenizer_config.json