说明文档

GPT2

本仓库包含兼容 TensorRT 的 GPT2 onnx 模型：

模型的量化由 ENOT-AutoDL 框架完成。构建 TensorRT 引擎的代码和示例已发布在 github。

	TensorRT INT8+FP32	torch FP16
Lambada 准确率	72.11%	71.43%

输入序列长度	生成 token 数量	TensorRT INT8+FP32 毫秒	torch FP16 毫秒	加速比
64	64	462	1190	2.58
64	128	920	2360	2.54
64	256	1890	4710	2.54

推理和精度测试的示例已发布在 github：

git clone https://github.com/ENOT-AutoDL/ENOT-transformers

作者 ENOT-AutoDL

text-generation transformers

↓ 0 ♥ 4

创建时间: 2023-06-05 13:46:50+00:00

更新时间: 2023-06-08 13:42:08+00:00

.gitattributes

README.md

gpt2-xl-i8.data

gpt2-xl-i8.onnx ONNX

gpt2-xl.data

gpt2-xl.onnx ONNX