说明文档

DistilBERT 多语言基础模型模型卡

模型详情

模型描述

该模型是 BERT 多语言基础模型的蒸馏版本。蒸馏过程的代码可以在这里找到。该模型是区分大小写的：它会将 english 和 English 区别对待。

该模型在104种不同语言的维基百科 concatenation 上进行训练， languages 列表见这里。该模型有6层，768维度和12个注意力头，共134M参数（相比 mBERT-base 的177M参数）。平均而言，该模型（称为 DistilmBERT）比 mBERT-base 快两倍。

我们鼓励潜在用户查看 BERT 多语言基础模型模型卡以了解更多关于使用、局限性和潜在偏见的信息。

开发者： Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf (Hugging Face)
模型类型： 基于Transformer的语言模型
语言（NLP）： 104种语言；完整列表见这里
许可证： Apache 2.0
相关模型： BERT 多语言基础模型
更多信息资源：
- GitHub 仓库
- 相关论文

用途

直接使用和下游使用

您可以将原始模型用于掩码语言建模或下一句预测，但它主要用于在下游任务上进行微调。请查看模型中心查找您感兴趣的任务的微调版本。

请注意，该模型主要针对在利用整个句子（可能被掩码）进行决策的任务上进行微调，例如序列分类、token分类或问答。对于文本生成等任务，您应该查看类似 GPT2 的模型。

超出范围的使用

该模型不应用于故意为人们创造敌意或排斥的环境。该模型并未经过训练来成为人物或事件的真实或准确表述，因此使用该模型生成此类内容超出了该模型能力的范围。

偏见、风险与局限性

大量研究已探索了语言模型中的偏见和公平性问题（参见例如 Sheng et al. (2021) 和 Bender et al. (2021)）。模型生成的预测可能包含关于受保护类别、身份特征以及敏感、社会和职业群体的令人不安的有害刻板印象。

建议

用户（无论是直接使用还是下游使用）都应该了解该模型的风险、偏见和局限性。

训练详情

该模型在104种不同语言的维基百科 concatenation 上，以 bert-base-multilingual-cased 为监督进行预训练
该模型有6层，768维度和12个注意力头，共134M参数。
有关训练过程和数据的更多信息，请参阅 bert-base-multilingual-cased 模型卡。

评估

模型开发者在 XNLI 中6种语言的测试集上报告了 DistilmBERT 的以下准确率结果（参见 GitHub 仓库）：

以下是 XNLI 中6种可用语言的测试集结果。结果在零样本设置下计算（使用英语部分训练，评估目标语言部分）：

模型	英语	西班牙语	中文	德语	阿拉伯语	乌尔都语
mBERT base cased (计算值)	82.1	74.6	69.1	72.3	66.4	58.5
mBERT base uncased (报告值)	81.4	74.3	63.8	70.5	62.1	58.3
DistilmBERT	78.2	69.1	64.0	66.3	59.1	54.7

环境影响

碳排放可以使用 Machine Learning Impact 计算器进行估算，见 Lacoste et al. (2019)。

硬件类型： 需要更多信息
使用时长： 需要更多信息
云服务提供商： 需要更多信息
计算区域： 需要更多信息
碳排放量： 需要更多信息

引用

@article{Sanh2019DistilBERTAD,
  title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
  author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.01108}
}

APA

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

如何开始使用模型

您可以直接使用模型进行掩码语言建模的管道：

>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='distilbert-base-multilingual-cased')
>>> unmasker("Hello I'm a [MASK] model.")

[{'score': 0.040800247341394424,
  'sequence': "Hello I'm a virtual model.",
  'token': 37859,
  'token_str': 'virtual'},
 {'score': 0.020015988498926163,
  'sequence': "Hello I'm a big model.",
  'token': 22185,
  'token_str': 'big'},
 {'score': 0.018680453300476074,
  'sequence': "Hello I'm a Hello model.",
  'token': 31178,
  'token_str': 'Hello'},
 {'score': 0.017396586015820503,
  'sequence': "Hello I'm a model model.",
  'token': 13192,
  'token_str': 'model'},
 {'score': 0.014229810796678066,
  'sequence': "Hello I'm a perfect model.",
  'token': 43477,
  'token_str': 'perfect'}]

distilbert/distilbert-base-multilingual-cased

作者 distilbert

fill-mask transformers

↓ 964.6K ♥ 236

创建时间: 2022-03-02 23:29:04+00:00

更新时间: 2024-05-06 13:46:54+00:00

在 Hugging Face 上查看

文件 (10)

.gitattributes

README.md

config.json

model.onnx ONNX

model.safetensors

pytorch_model.bin

tf_model.h5

tokenizer.json

tokenizer_config.json

vocab.txt