说明文档

Llama-2-13b-chat DirectML ONNX 模型

本仓库托管了 meta-llama/Llama-2-13b-chat-hf 的优化版本，用于加速 ONNX Runtime for DirectML 的推理。

Windows 使用方法 (Intel / AMD / Nvidia / Qualcomm)

conda create -n onnx python=3.10
conda activate onnx
winget install -e --id GitHub.GitLFS
pip install huggingface-hub[cli]
huggingface-cli download EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml --local-dir .\llama-2-13b-chat
pip install numpy==1.26.4
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py" -OutFile "phi3-qa.py"
pip install onnxruntime-directml
pip install --pre onnxruntime-genai-directml
conda install conda-forge::vs2015_runtime
python phi3-qa.py -m .\llama-2-13b-chat

什么是 DirectML

DirectML 是一个高性能、硬件加速的 DirectX 12 机器学习库。DirectML 为各种支持的硬件和驱动程序上的常见机器学习任务提供 GPU 加速，包括来自 AMD、Intel、NVIDIA 和 Qualcomm 等供应商的所有支持 DirectX 12 的 GPU。

EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml

作者 EmbeddedLLM

text-generation transformers

↓ 0 ♥ 0

创建时间: 2024-06-16 16:09:53+00:00

更新时间: 2024-06-17 15:33:47+00:00

在 Hugging Face 上查看

文件 (9)

.gitattributes

README.md

config.json

genai_config.json

model.onnx ONNX

model.onnx.data

special_tokens_map.json

tokenizer.json

tokenizer_config.json