说明文档

用法

ONNXRuntime

from transformers import AutoConfig, AutoTokenizer
import onnxruntime
import numpy as np

# 1. 加载配置、处理器和模型
path_to_model = "./gemma-3-1b-it-ONNX"
config = AutoConfig.from_pretrained(path_to_model)
tokenizer = AutoTokenizer.from_pretrained(path_to_model)
decoder_session = onnxruntime.InferenceSession(f"{path_to_model}/onnx/model.onnx")

## 设置配置值
num_key_value_heads = config.num_key_value_heads
head_dim = config.head_dim
num_hidden_layers = config.num_hidden_layers
eos_token_id = 106 # 106 对应 <end_of_turn>

# 2. 准备输入
## 创建输入消息
messages = [
  { "role": "system", "content": "You are a helpful assistant." },
  { "role": "user", "content": "Write me a poem about Machine Learning." },
]

## 应用分词器
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="np")

## 准备解码器输入
batch_size = inputs['input_ids'].shape[0]
past_key_values = {
    f'past_key_values.{layer}.{kv}': np.zeros([batch_size, num_key_value_heads, 0, head_dim], dtype=np.float32)
    for layer in range(num_hidden_layers)
    for kv in ('key', 'value')
}
input_ids = inputs['input_ids']
position_ids = np.tile(np.arange(1, input_ids.shape[-1] + 1), (batch_size, 1))

# 3. 生成循环
max_new_tokens = 1024
generated_tokens = np.array([[]], dtype=np.int64)
for i in range(max_new_tokens):
  logits, *present_key_values = decoder_session.run(None, dict(
      input_ids=input_ids,
      position_ids=position_ids,
      **past_key_values,
  ))

  ## 更新下一轮生成循环的值
  input_ids = logits[:, -1].argmax(-1, keepdims=True)
  position_ids = position_ids[:, -1:] + 1
  for j, key in enumerate(past_key_values):
    past_key_values[key] = present_key_values[j]

  generated_tokens = np.concatenate([generated_tokens, input_ids], axis=-1)
  if (input_ids == eos_token_id).all():
    break

  ## (可选) 流式输出
  print(tokenizer.decode(input_ids[0]), end='', flush=True)
print()

# 4. 输出结果
print(tokenizer.batch_decode(generated_tokens))

<details> <summary>查看示例输出</summary>

Okay, here's a poem about Machine Learning, aiming for a balance of technical and evocative language:

**The Silent Learner**

The data streams, a boundless flow,
A river vast, where patterns grow.
No human hand to guide the way,
Just algorithms, come what may.

Machine Learning, a subtle art,
To teach a system, a brand new start.
With weights and biases, finely tuned,
It seeks the truth, beneath the moon.

It learns from errors, big and small,
Adjusting swiftly, standing tall.
From pixels bright to voices clear,
It builds a model, banishing fear.

Of blind prediction, cold and stark,
It finds the meaning, leaves its mark.
A network deep, a complex grace,
Discovering insights, time and space.

It sees the trends, the subtle hue,
Predicting futures, fresh and new.
A silent learner, ever keen,
A digital mind, unseen, serene.

So let the code begin to gleam,
A blossoming of a learning dream. 
Machine Learning, a wondrous sight,
Shaping the future, shining bright. 

---

Would you like me to:

*   Adjust the tone or style? (e.g., more technical, more metaphorical)
*   Focus on a specific aspect of ML (e.g., neural networks, data analysis?)
*   Create a different length or format?

</details>

Transformers.js

import { pipeline } from "@huggingface/transformers";

// 创建文本生成管道
const generator = await pipeline(
  "text-generation",
  "onnx-community/gemma-3-1b-it-ONNX",
  { dtype: "q4" },
);

// 定义消息列表
const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Write me a poem about Machine Learning." },
];

// 生成回复
const output = await generator(messages, { max_new_tokens: 512, do_sample: false });
console.log(output[0].generated_text.at(-1).content);

smartvest-llc/onnx-models

作者 smartvest-llc

text-generation transformers.js

↓ 1 ♥ 0

创建时间: 2025-09-12 20:23:51+00:00

更新时间: 2025-09-14 03:19:45+00:00

在 Hugging Face 上查看

文件 (17)

.gitattributes

README.md

config.json

generation_config.json

kokoro/model.onnx ONNX

kokoro/voices.json

onnx/model.onnx ONNX

onnx/model_bnb4.onnx ONNX

onnx/model_fp16.onnx ONNX

onnx/model_int8.onnx ONNX

onnx/model_q4.onnx ONNX

onnx/model_q4f16.onnx ONNX

onnx/model_quantized.onnx ONNX

onnx/model_uint8.onnx ONNX

special_tokens_map.json

tokenizer.json

tokenizer_config.json