说明文档

gte-multilingual-reranker-base-onnx-op19-opt-gpu

该模型是 Alibaba-NLP/gte-multilingual-reranker-base 的 ONNX 版本，使用 ONNX opset 19。

模型详情

框架: ONNX Runtime
ONNX Opset: 19
任务: 句子相似度 (sentence-similarity)
目标设备: GPU
已优化: 是
原始模型: Alibaba-NLP/gte-multilingual-reranker-base
导出日期: 2025-03-31
作者: 本模型由 Jaro 修改

环境与包版本

包名	版本
transformers	4.48.3
optimum	1.24.0
onnx	1.17.0
onnxruntime	1.21.0
torch	2.5.1
numpy	1.26.4
huggingface_hub	0.28.1
python	3.12.9
system	Darwin 24.3.0

应用的优化

优化项	设置
图优化级别	Extended
针对 GPU 优化	是
使用 FP16	否
启用 Transformers 特定优化	是
启用 Gelu 融合	是
启用 Layer Norm 融合	是
启用 Attention 融合	是
启用 Skip Layer Norm 融合	是
启用 Gelu 近似	是

使用方法

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

# 加载模型和分词器
model = ORTModelForSequenceClassification.from_pretrained("onnx")
tokenizer = AutoTokenizer.from_pretrained("onnx")

# 准备输入
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt")

# 运行推理
outputs = model(**inputs)

导出过程

本模型使用 Hugging Face 的 Optimum 库导出为 ONNX 格式，opset 版本为 19。在导出过程中应用了图优化，针对 GPU 设备进行了优化。

性能

ONNX Runtime 模型通常比原生 PyTorch 模型提供更好的推理速度，特别是在部署到生产环境时。

ConfidentialMind/gte-multilingual-reranker-base-onnx-op19-opt-gpu

作者 ConfidentialMind

sentence-similarity

↓ 0 ♥ 0

创建时间: 2025-03-31 10:12:13+00:00

更新时间: 2025-07-07 07:31:53+00:00

在 Hugging Face 上查看

文件 (9)

.gitattributes

README.md

config.json

model.onnx ONNX

optimization_report.json

special_tokens_map.json

tokenizer.json

tokenizer_config.json

upload_info.json