说明文档

bert-large-uncased-wwm-squadv2-optimized-f16

这是一个优化后的模型，以 madlag/bert-large-uncased-wwm-squadv2-x2.63-f82.6-d16-hybrid-v1 为基础模型，使用 nn_pruning Python 库创建。这是 madlag/bert-large-uncased-whole-word-masking-finetuned-squadv2 的剪枝模型。

欢迎阅读我们关于如何优化此模型的博客文章 (链接)

我们最终优化的模型大小为 579 MB，在 Tesla T4 上的推理速度为 18.184 ms，最佳 F1 性能为 82.68%。以下是各个基础模型的对比：

模型	大小	Tesla T4 吞吐量	最佳 F1
madlag/bert-large-uncased-whole-word-masking-finetuned-squadv2	1275 MB	140.529 ms	86.08%
madlag/bert-large-uncased-wwm-squadv2-x2.63-f82.6-d16-hybrid-v1	1085 MB	90.801 ms	82.67%
我们的优化模型	579 MB	18.184 ms	82.68%

您可以在 tryolabs/transformers-optimization space 上测试这些模型的推理效果。

使用示例

import torch
from huggingface_hub import hf_hub_download
from onnxruntime import InferenceSession
from transformers import AutoModelForQuestionAnswering, AutoTokenizer

MAX_SEQUENCE_LENGTH = 512

# Download the model
model= hf_hub_download(
    repo_id=\"tryolabs/bert-large-uncased-wwm-squadv2-optimized-f16\", filename=\"model.onnx\"
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(\"tryolabs/bert-large-uncased-wwm-squadv2-optimized-f16\")

question = \"Who worked a little bit harder?\"
context = \"The first little pig was very lazy. He didn't want to work at all and he built his house out of straw. The second little pig worked a little bit harder but he was somewhat lazy too and he built his house out of sticks. Then, they sang and danced and played together the rest of the day.\"

# Generate an input
inputs = dict(
    tokenizer(
        question, context, return_tensors=\"np\", max_length=MAX_SEQUENCE_LENGTH
    )
)

# Create session
sess = InferenceSession(
    model, providers=[\"CPUExecutionProvider\"]
)

# Run predictions
output = sess.run(None, input_feed=inputs)

answer_start_scores, answer_end_scores = torch.tensor(output[0]), torch.tensor(
    output[1]
)

# Post process predictions
input_ids = inputs[\"input_ids\"].tolist()[0]
answer_start = torch.argmax(answer_start_scores)
answer_end = torch.argmax(answer_end_scores) + 1
answer = tokenizer.convert_tokens_to_string(
    tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end])
)

# Output prediction
print(\"Answer\", answer)

tryolabs/bert-large-uncased-wwm-squadv2-optimized-f16

作者 tryolabs

question-answering

↓ 0 ♥ 3

创建时间: 2022-11-11 20:45:29+00:00

更新时间: 2022-12-01 12:20:21+00:00

在 Hugging Face 上查看

文件 (7)

.gitattributes

README.md

model.onnx ONNX

special_tokens_map.json

tokenizer.json

tokenizer_config.json

vocab.txt