说明文档

模型卡片

模型详情

meta-llama/Meta-Llama-3.1-8B-Instruct 量化为 ONNX GenAI INT4，并进行了 Microsoft DirectML 优化。 输出已重新格式化，每个句子从新行开始以提高可读性。 <pre> ... vNewDecoded = tokenizer_stream.decode(new_token) if re.fullmatch("^[\x2E\x3A\x3B]$", vPreviousDecoded) and vNewDecoded.startswith(" ") and (not vNewDecoded.startswith(" *")) : print("\n" + vNewDecoded.replace(" ", "", 1), end='', flush=True) else : print(vNewDecoded, end='', flush=True) vPreviousDecoded = vNewDecoded ... </pre>

模型描述

meta-llama/Meta-Llama-3.1-8B-Instruct 量化为 ONNX GenAI INT4，并进行了 Microsoft DirectML 优化 https://onnxruntime.ai/docs/genai/howto/install.html#directml

使用 ONNX Runtime GenAI 的 builder.py 创建 https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/src/python/py/models/builder.py

构建选项： INT4 精度级别：FP32 (float32)

开发者： Mochamad Aris Zamroni

模型来源 [可选]

https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

直接使用

这是经过 Microsoft Windows DirectML 优化的模型。 它可能无法在除 DmlExecutionProvider 之外的 ONNX 执行提供程序上运行。 所需的 Python 脚本已包含在此仓库中

前提条件：

从 Windows 应用商店安装 Python 3.11： https://apps.microsoft.com/search/publisher?name=Python+Software+Foundation
打开命令行 cmd.exe
创建 Python 虚拟环境，激活该环境，然后安装 onnxruntime-genai-directml mkdir c:\temp cd c:\temp python -m venv dmlgenai dmlgenai\Scripts\activate.bat pip install onnxruntime-genai-directml
使用 onnxgenairun.py 获取聊天界面。 它是 "https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3-qa.py" 的修改版本。 该修改使文本输出在 "., :, 和 ;" 之后换行，使输出更易于阅读。

rem 切换到存储模型和脚本文件的目录 cd this_onnx_model_directory python onnxgenairun.py --help python onnxgenairun.py -m . -v -g

（可选但推荐）设备特定优化。 a. 使用文本编辑器打开 "dml-device-specific-optim.py" 并相应地更改文件路径。 b. 运行 Python 脚本：python dml-device-specific-optim.py c. 将原始 model.onnx 重命名为其他文件名，并将步骤 5.b 中优化后的 onnx 文件重命名为 model.onnx 文件。 d. 重新运行步骤 4。

速度、大小、时间 [可选]

在 Radeon 780M 上速度为 15 token/s，预分配 8GB RAM。 使用设备特定优化的 model.onnx 后速度提升至 16 token/s。 作为对比，LM Studio 使用 GGUF INT4 模型和 VulkanML GPU 加速运行速度为 13 token/s。

硬件

AMD Ryzen Zen4 7840U 配备集成 Radeon 780M GPU RAM 32GB

软件

Windows 10 上的 Microsoft DirectML

模型卡片作者 [可选]

Mochamad Aris Zamroni

模型卡片联系方式

https://www.linkedin.com/in/zamroni/

zamroni111/Meta-Llama-3.1-8B-Instruct-ONNX-DirectML-GenAI-INT4

作者 zamroni111

text-generation

↓ 0 ♥ 1

创建时间: 2024-09-11 11:52:06+00:00

更新时间: 2025-04-01 11:44:25+00:00

在 Hugging Face 上查看

文件 (10)

.gitattributes

README.md

dml-device-specific-optim.py

genai_config.json

model.onnx ONNX

model.onnx.data

onnxgenairun.py

special_tokens_map.json

tokenizer.json

tokenizer_config.json