返回模型
说明文档
Llama 3 8B Instruct 模型,使用 SparseGPT、SmoothQuant 和 GPTQ 进行一次性压缩,达到 50% 稀疏度和 INT8 权重+激活值。
使用 SparseML+DeepSparse=1.7 制作。安装命令:pip install deepsparse~=1.7 "sparseml[transformers]"~=1.7 "numpy<2"。
以下是用于 SparseML 压缩的脚本:
from datasets import load_dataset
from sparseml.transformers import (
SparseAutoModelForCausalLM,
SparseAutoTokenizer,
load_dataset,
compress,
)
model = SparseAutoModelForCausalLM.from_pretrained(
\"meta-llama/Meta-Llama-3-8B-Instruct\", device_map=\"auto\"
)
tokenizer = SparseAutoTokenizer.from_pretrained(\"meta-llama/Meta-Llama-3-8B-Instruct\")
dataset = load_dataset(\"garage-bAInd/Open-Platypus\")
def format_data(data):
instruction = tokenizer.apply_chat_template(
[{\"role\": \"user\", \"content\": data[\"instruction\"]}],
tokenize=False,
add_generation_prompt=True,
)
return {\"text\": instruction + data[\"output\"]}
dataset = dataset.map(format_data)
recipe = \"\"\"
compression_stage:
run_type: oneshot
oneshot_modifiers:
QuantizationModifier:
ignore:
# These operations don't make sense to quantize
- LlamaRotaryEmbedding
- LlamaRMSNorm
- SiLUActivation
- QuantizableMatMul
# Skip quantizing the layers with the most sensitive activations
- model.layers.1.mlp.down_proj
- model.layers.31.mlp.down_proj
- model.layers.14.self_attn.q_proj
- model.layers.14.self_attn.k_proj
- model.layers.14.self_attn.v_proj
post_oneshot_calibration: true
scheme_overrides:
# Enable channelwise quantization for better accuracy
Linear:
weights:
num_bits: 8
symmetric: true
strategy: channel
# For the embeddings, only weight-quantization makes sense
Embedding:
input_activations: null
weights:
num_bits: 8
symmetric: false
SparseGPTModifier:
sparsity: 0.5
quantize: True
targets: ['re:model.layers.\\d*$']
\"\"\"
compress(
model=model,
tokenizer=tokenizer,
dataset=dataset,
recipe=recipe,
output_dir=\"./one-shot-checkpoint\",
)
mgoin/Meta-Llama-3-8B-Instruct-pruned50-quant-ds
作者 mgoin
text-generation
transformers
↓ 1
♥ 0
创建时间: 2024-06-28 16:13:17+00:00
更新时间: 2024-06-28 16:59:41+00:00
在 Hugging Face 上查看文件 (9)
.gitattributes
README.md
config.json
model-orig.onnx
ONNX
model.data
model.onnx
ONNX
special_tokens_map.json
tokenizer.json
tokenizer_config.json