返回模型
说明文档
bert-large-uncased-wwm-squadv2-optimized-f16
这是一个优化后的模型,以 madlag/bert-large-uncased-wwm-squadv2-x2.63-f82.6-d16-hybrid-v1 为基础模型,使用 nn_pruning Python 库创建。这是 madlag/bert-large-uncased-whole-word-masking-finetuned-squadv2 的剪枝模型。
欢迎阅读我们关于如何优化此模型的博客文章 (链接)
我们最终优化的模型大小为 579 MB,在 Tesla T4 上的推理速度为 18.184 ms,最佳 F1 性能为 82.68%。以下是各个基础模型的对比:
| 模型 | 大小 | Tesla T4 吞吐量 | 最佳 F1 |
|---|---|---|---|
| madlag/bert-large-uncased-whole-word-masking-finetuned-squadv2 | 1275 MB | 140.529 ms | 86.08% |
| madlag/bert-large-uncased-wwm-squadv2-x2.63-f82.6-d16-hybrid-v1 | 1085 MB | 90.801 ms | 82.67% |
| 我们的优化模型 | 579 MB | 18.184 ms | 82.68% |
您可以在 tryolabs/transformers-optimization space 上测试这些模型的推理效果。
使用示例
import torch
from huggingface_hub import hf_hub_download
from onnxruntime import InferenceSession
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
MAX_SEQUENCE_LENGTH = 512
# Download the model
model= hf_hub_download(
repo_id=\"tryolabs/bert-large-uncased-wwm-squadv2-optimized-f16\", filename=\"model.onnx\"
)
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(\"tryolabs/bert-large-uncased-wwm-squadv2-optimized-f16\")
question = \"Who worked a little bit harder?\"
context = \"The first little pig was very lazy. He didn't want to work at all and he built his house out of straw. The second little pig worked a little bit harder but he was somewhat lazy too and he built his house out of sticks. Then, they sang and danced and played together the rest of the day.\"
# Generate an input
inputs = dict(
tokenizer(
question, context, return_tensors=\"np\", max_length=MAX_SEQUENCE_LENGTH
)
)
# Create session
sess = InferenceSession(
model, providers=[\"CPUExecutionProvider\"]
)
# Run predictions
output = sess.run(None, input_feed=inputs)
answer_start_scores, answer_end_scores = torch.tensor(output[0]), torch.tensor(
output[1]
)
# Post process predictions
input_ids = inputs[\"input_ids\"].tolist()[0]
answer_start = torch.argmax(answer_start_scores)
answer_end = torch.argmax(answer_end_scores) + 1
answer = tokenizer.convert_tokens_to_string(
tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end])
)
# Output prediction
print(\"Answer\", answer)
tryolabs/bert-large-uncased-wwm-squadv2-optimized-f16
作者 tryolabs
question-answering
↓ 0
♥ 3
创建时间: 2022-11-11 20:45:29+00:00
更新时间: 2022-12-01 12:20:21+00:00
在 Hugging Face 上查看文件 (7)
.gitattributes
README.md
model.onnx
ONNX
special_tokens_map.json
tokenizer.json
tokenizer_config.json
vocab.txt