ONNX 模型库
返回模型

说明文档

任务: question-answering
后端: sagemaker-training
后端参数: {'instance_type': 'ml.m5.2xlarge', 'supported_instructions': 'avx512'}
评估样本数量: 全部数据集

固定参数:

  • 数据集: [{'path': 'squad', 'eval_split': 'validation', 'data_keys': {'question': 'question', 'context': 'context'}, 'ref_keys': ['answers'], 'name': None, 'calibration_split': None}]
  • 模型名称或路径: distilbert-base-uncased-distilled-squad
  • 来自transformers: True
  • 量化方法: dynamic

基准测试参数:

  • 框架: onnxruntime, pytorch
  • 待量化算子: ['Add', 'MatMul'], ['Add']
  • 节点排除: [], ['layernorm', 'gelu', 'residual', 'gather', 'softmax']
  • 按通道量化: False, True
  • 框架参数: {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}, {}
  • 缩小范围: True, False
  • 应用量化: True, False

评估

非时间指标

框架 待量化算子 节点排除 按通道量化 框架参数 缩小范围 应用量化 精确匹配 F1值
onnxruntime None None None {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} None False | 78.884 | 86.690
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 76.764 | 85.053
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 69.622 | 79.914
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 0.435 | 5.887
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 78.165 | 85.973
onnxruntime ['Add', 'MatMul'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 76.764 | 85.053
onnxruntime ['Add', 'MatMul'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 69.622 | 79.914
onnxruntime ['Add', 'MatMul'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 0.435 | 5.887
onnxruntime ['Add', 'MatMul'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 78.165 | 85.973
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 78.884 | 86.690
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 78.884 | 86.690
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 78.884 | 86.690
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 78.884 | 86.690
onnxruntime ['Add'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 78.884 | 86.690
onnxruntime ['Add'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 78.884 | 86.690
onnxruntime ['Add'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 78.884 | 86.690
onnxruntime ['Add'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 78.884 | 86.690
pytorch None None None {} None None | 78.884 | 86.690

时间指标

时间基准测试每个配置运行15秒。

以下是批大小 = 1,输入长度 = 32 的时间指标。

框架 待量化算子 节点排除 按通道量化 框架参数 缩小范围 应用量化 平均延迟 (ms) 吞吐量 (/s)
onnxruntime None None None {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} None False | 14.26 | 70.13
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 10.08 | 99.20
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 10.60 | 94.33
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 10.88 | 91.93
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 10.84 | 92.27
onnxruntime ['Add', 'MatMul'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 10.34 | 96.73
onnxruntime ['Add', 'MatMul'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 10.41 | 96.07
onnxruntime ['Add', 'MatMul'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 10.96 | 91.27
onnxruntime ['Add', 'MatMul'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 10.69 | 93.53
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 14.43 | 69.33
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 14.52 | 68.87
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 14.35 | 69.73
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 14.50 | 69.00
onnxruntime ['Add'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 14.20 | 70.47
onnxruntime ['Add'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 14.24 | 70.27
onnxruntime ['Add'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 14.58 | 68.67
onnxruntime ['Add'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 14.73 | 67.87
pytorch None None None {} None None | 31.49 | 31.80

以下是批大小 = 1,输入长度 = 64 的时间指标。

框架 待量化算子 节点排除 按通道量化 框架参数 缩小范围 应用量化 平均延迟 (ms) 吞吐量 (/s)
onnxruntime None None None {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} None False | 24.83 | 40.33
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 18.49 | 54.13
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 18.87 | 53.00
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 19.17 | 52.20
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 18.92 | 52.87
onnxruntime ['Add', 'MatMul'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 19.13 | 52.33
onnxruntime ['Add', 'MatMul'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 18.95 | 52.80
onnxruntime ['Add', 'MatMul'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 19.08 | 52.47
onnxruntime ['Add', 'MatMul'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 19.14 | 52.27
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 24.83 | 40.33
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 24.84 | 40.27
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 24.66 | 40.60
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 24.76 | 40.40
onnxruntime ['Add'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 25.07 | 39.93
onnxruntime ['Add'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 25.27 | 39.60
onnxruntime ['Add'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 24.76 | 40.40
onnxruntime ['Add'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 24.70 | 40.53
pytorch None None None {} None None | 41.26 | 24.27

以下是批大小 = 1,输入长度 = 128 的时间指标。

框架 待量化算子 节点排除 按通道量化 框架参数 缩小范围 应用量化 平均延迟 (ms) 吞吐量 (/s)
onnxruntime None None None {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} None False | 46.89 | 21.33
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 34.84 | 28.73
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 35.88 | 27.93
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 36.92 | 27.13
onnxruntime ['Add', 'MatMul'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 36.25 | 27.60
onnxruntime ['Add', 'MatMul'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 36.17 | 27.67
onnxruntime ['Add', 'MatMul'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 35.59 | 28.13
onnxruntime ['Add', 'MatMul'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 37.36 | 26.80
onnxruntime ['Add', 'MatMul'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 35.97 | 27.87
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 46.94 | 21.33
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 47.19 | 21.20
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 47.05 | 21.27
onnxruntime ['Add'] ['layernorm', 'gelu', 'residual', 'gather', 'softmax'] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 46.79 | 21.40
onnxruntime ['Add'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 46.87 | 21.40
onnxruntime ['Add'] [] False {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 47.04 | 21.27
onnxruntime ['Add'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} False True | 47.08 | 21.27
onnxruntime ['Add'] [] True {'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4} True True | 47.05 | 21.27
pytorch None None None {} None None | 54.61 | 18.33

fxmarty/20220911-h13m58s53_squad_qa_distilbert_dynamic

作者 fxmarty

question-answering
↓ 0 ♥ 0

创建时间: 2022-09-11 22:20:48+00:00

更新时间: 2022-09-11 22:21:43+00:00

在 Hugging Face 上查看

文件 (89)

.gitattributes
20220911-h14m23s56_0/model.onnx ONNX
20220911-h14m23s56_0/ort_config.json
20220911-h14m23s56_0/quantized_model.onnx ONNX
20220911-h14m23s56_0/results.json
20220911-h14m53s50_1/model.onnx ONNX
20220911-h14m53s50_1/ort_config.json
20220911-h14m53s50_1/quantized_model.onnx ONNX
20220911-h14m53s50_1/results.json
20220911-h15m19s07_2/model.onnx ONNX
20220911-h15m19s07_2/ort_config.json
20220911-h15m19s07_2/quantized_model.onnx ONNX
20220911-h15m19s07_2/results.json
20220911-h15m49s21_3/model.onnx ONNX
20220911-h15m49s21_3/ort_config.json
20220911-h15m49s21_3/quantized_model.onnx ONNX
20220911-h15m49s21_3/results.json
20220911-h16m14s23_4/model.onnx ONNX
20220911-h16m14s23_4/ort_config.json
20220911-h16m14s23_4/quantized_model.onnx ONNX
20220911-h16m14s23_4/results.json
20220911-h16m39s55_5/model.onnx ONNX
20220911-h16m39s55_5/ort_config.json
20220911-h16m39s55_5/quantized_model.onnx ONNX
20220911-h16m39s55_5/results.json
20220911-h17m10s11_6/model.onnx ONNX
20220911-h17m10s11_6/ort_config.json
20220911-h17m10s11_6/quantized_model.onnx ONNX
20220911-h17m10s11_6/results.json
20220911-h17m40s23_7/model.onnx ONNX
20220911-h17m40s23_7/ort_config.json
20220911-h17m40s23_7/quantized_model.onnx ONNX
20220911-h17m40s23_7/results.json
20220911-h18m05s42_8/model.onnx ONNX
20220911-h18m05s42_8/ort_config.json
20220911-h18m05s42_8/quantized_model.onnx ONNX
20220911-h18m05s42_8/results.json
20220911-h18m35s59_9/model.onnx ONNX
20220911-h18m35s59_9/ort_config.json
20220911-h18m35s59_9/quantized_model.onnx ONNX
20220911-h18m35s59_9/results.json
20220911-h19m06s15_10/model.onnx ONNX
20220911-h19m06s15_10/ort_config.json
20220911-h19m06s15_10/quantized_model.onnx ONNX
20220911-h19m06s15_10/results.json
20220911-h19m36s15_11/model.onnx ONNX
20220911-h19m36s15_11/ort_config.json
20220911-h19m36s15_11/quantized_model.onnx ONNX
20220911-h19m36s15_11/results.json
20220911-h20m06s26_12/model.onnx ONNX
20220911-h20m06s26_12/ort_config.json
20220911-h20m06s26_12/quantized_model.onnx ONNX
20220911-h20m06s26_12/results.json
20220911-h20m32s12_13/model.onnx ONNX
20220911-h20m32s12_13/ort_config.json
20220911-h20m32s12_13/quantized_model.onnx ONNX
20220911-h20m32s12_13/results.json
20220911-h20m57s50_14/model.onnx ONNX
20220911-h20m57s50_14/ort_config.json
20220911-h20m57s50_14/quantized_model.onnx ONNX
20220911-h20m57s50_14/results.json
20220911-h21m23s01_15/model.onnx ONNX
20220911-h21m23s01_15/ort_config.json
20220911-h21m23s01_15/quantized_model.onnx ONNX
20220911-h21m23s01_15/results.json
20220911-h21m53s17_16/model.onnx ONNX
20220911-h21m53s17_16/results.json
20220911-h22m20s47_17/results.json
README.md
runs.json
tensorboard/1662934853.040312/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.1
tensorboard/1662934853.0417204/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.2
tensorboard/1662934853.042942/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.3
tensorboard/1662934853.0442307/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.4
tensorboard/1662934853.0453482/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.5
tensorboard/1662934853.04647/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.6
tensorboard/1662934853.0478818/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.7
tensorboard/1662934853.0491567/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.8
tensorboard/1662934853.0507572/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.9
tensorboard/1662934853.0518975/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.10
tensorboard/1662934853.0530026/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.11
tensorboard/1662934853.0541465/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.12
tensorboard/1662934853.0553083/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.13
tensorboard/1662934853.056435/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.14
tensorboard/1662934853.0575647/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.15
tensorboard/1662934853.058674/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.16
tensorboard/1662934853.0610743/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.17
tensorboard/1662934853.062237/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.18
tensorboard/events.out.tfevents.1662934853.ip-10-0-142-213.ec2.internal.1.0