说明文档

模型卡片

模型详情

deepseek-ai/DeepSeek-R1-Distill-Llama-8B 量化为 ONNX GenAI INT4 格式，并进行了 Microsoft DirectML 优化。 输出已重新格式化，每个句子从新行开始以提高可读性。 <pre> ... vNewDecoded = tokenizer_stream.decode(new_token) if re.fullmatch("^[\x2E\x3A\x3B]$", vPreviousDecoded) and vNewDecoded.startswith(" ") and (not vNewDecoded.startswith(" *")) : print("\n" + vNewDecoded.replace(" ", "", 1), end='', flush=True) else : print(vNewDecoded, end='', flush=True) vPreviousDecoded = vNewDecoded ... </pre> 输出将以 COTS/推理（reasoning）开始。 在 tokenizer_config.json 中，"unk_token" 的值已从 null 更改为 ""

模型描述

deepseek-ai/DeepSeek-R1-Distill-Llama-8B 量化为 ONNX GenAI INT4 格式，并进行了 Microsoft DirectML 优化 https://onnxruntime.ai/docs/genai/howto/install.html#directml

使用 ONNX Runtime GenAI 的 builder.py 创建 https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/src/python/py/models/builder.py

构建选项： INT4 精度级别：FP32 (float32)

开发者： Mochamad Aris Zamroni

模型来源 [可选]

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B

直接使用

这是经过 Microsoft Windows DirectML 优化的模型。 它可能无法在除 DmlExecutionProvider 之外的 ONNX 执行提供程序上运行。 所需的 Python 脚本已包含在此仓库中

前置条件：

从 Windows 商店安装 Python 3.11： https://apps.microsoft.com/search/publisher?name=Python+Software+Foundation
打开命令行 cmd.exe
创建 Python 虚拟环境，激活该环境，然后安装 onnxruntime-genai-directml mkdir c:\temp cd c:\temp python -m venv dmlgenai dmlgenai\Scripts\activate.bat pip install onnxruntime-genai-directml
使用 onnxgenairun.py 获取聊天界面。 它是 "https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3-qa.py" 的修改版本。 修改使文本输出在 "., :, 和 ;" 后换行，使输出更易于阅读。

rem 切换到模型和脚本文件所在的目录 cd this_onnx_model_directory python onnxgenairun.py --help python onnxgenairun.py -m . -v -g

（可选但推荐）设备特定优化。 a. 使用文本编辑器打开 "dml-device-specific-optim.py" 并相应地更改文件路径。 b. 运行 Python 脚本：python dml-device-specific-optim.py c. 将原始 model.onnx 重命名为其他文件名，并将步骤 5.b 中优化后的 onnx 文件重命名为 model.onnx。 d. 重新运行步骤 4。

速度、大小、时间 [可选]

在 Radeon 780M 上速度为 15 token/s，预分配 8GB RAM。 使用设备特定优化的 model.onnx 后速度提升至 16 token/s。 作为对比，使用 GGUF INT4 模型和 VulkanML GPU 加速的 LM Studio 运行速度为 13 token/s。

硬件

AMD Ryzen Zen4 7840U 配备集成 Radeon 780M GPU 内存 32GB

软件

Windows 10 上的 Microsoft DirectML

模型卡片作者 [可选]

Mochamad Aris Zamroni

模型卡片联系方式

https://www.linkedin.com/in/zamroni/

onnx-community/DeepSeek-R1-Distill-Llama-8B-ONNX-DirectML-GenAI-INT4

作者 onnx-community

text-generation

↓ 0 ♥ 5

创建时间: 2025-02-03 07:24:53+00:00

更新时间: 2025-04-01 11:45:05+00:00

在 Hugging Face 上查看

文件 (11)

.gitattributes

README.md

dml-device-specific-optim.py

genai_config.json

model.onnx ONNX

model.onnx.data

onnxgenairun.py

special_tokens_map.json

tokenizer.json

tokenizer_config-ORIG.json

tokenizer_config.json