说明文档

SmolLM2

image/png

模型概述

SmolLM2 是一个紧凑型语言模型系列，包含三种参数规模：135M、360M 和 1.7B。这些模型能够在解决广泛任务的同时保持轻量级，足以在设备端运行。更多详情请参阅我们的论文：https://arxiv.org/abs/2502.02737v1

1.7B 版本相比其前身 SmolLM1-1.7B 展现出显著进步，特别是在指令跟随、知识、推理和数学能力方面。它使用 11 万亿个 token 进行训练，采用了多样化的数据集组合：FineWeb-Edu、DCLM、The Stack，以及我们新整理的数学和编码数据集（即将发布）。我们通过监督微调（SFT）使用公共数据集和我们自己整理的数据集相结合的方式开发了 instruct 版本。然后，我们应用了直接偏好优化（DPO），使用了 UltraFeedback。

instruct 模型还支持文本重写、摘要和函数调用等任务，这得益于 Argilla 开发的数据集，如 Synth-APIGen-v0.1。您可以在此处找到 SFT 数据集：https://huggingface.co/datasets/HuggingFaceTB/smoltalk。

更多详情请参阅：https://github.com/huggingface/smollm。您将找到预训练、后训练、评估和本地推理代码。

如何使用

Transformers

pip install transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-1.7B-Instruct"

device = "cuda" # 用于 GPU 使用，或 "cpu" 用于 CPU 使用
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# 如需多个 GPU，请安装 accelerate 并执行 `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "What is the capital of France."}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

TRL 中的聊天

您也可以使用 TRL CLI 从终端与模型聊天：

pip install trl
trl chat --model_name_or_path HuggingFaceTB/SmolLM2-1.7B-Instruct --device cpu

Transformers.js

npm i @huggingface/transformers

import { pipeline } from "@huggingface/transformers";

// 创建文本生成管道
const generator = await pipeline(
  "text-generation",
  "HuggingFaceTB/SmolLM2-1.7B-Instruct",
);

// 定义消息列表
const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Tell me a joke." },
];

// 生成回复
const output = await generator(messages, { max_new_tokens: 128 });
console.log(output[0].generated_text.at(-1).content);
// "Why don't scientists trust atoms?\n\nBecause they make up everything!"

评估

在本节中，我们报告 SmolLM2 的评估结果。除非另有说明，否则所有评估均为零样本评估，我们使用 lighteval 运行它们。

基础预训练模型

指标	SmolLM2-1.7B	Llama-1B	Qwen2.5-1.5B	SmolLM1-1.7B
HellaSwag	68.7	61.2	66.4	62.9
ARC (平均)	60.5	49.2	58.5	59.9
PIQA	77.6	74.8	76.1	76.0
MMLU-Pro (MCF)	19.4	11.7	13.7	10.8
CommonsenseQA	43.6	41.2	34.1	38.0
TriviaQA	36.7	28.1	20.9	22.5
Winogrande	59.4	57.8	59.3	54.7
OpenBookQA	42.2	38.4	40.0	42.4
GSM8K (5-shot)	31.0	7.2	61.3	5.5

指令模型

指标	SmolLM2-1.7B-Instruct	Llama-1B-Instruct	Qwen2.5-1.5B-Instruct	SmolLM1-1.7B-Instruct
IFEval (平均 prompt/inst)	56.7	53.5	47.4	23.1
MT-Bench	6.13	5.48	6.52	4.33
OpenRewrite-Eval (micro_avg RougeL)	44.9	39.2	46.9	NaN
HellaSwag	66.1	56.1	60.9	55.5
ARC (平均)	51.7	41.6	46.2	43.7
PIQA	74.4	72.3	73.2	71.6
MMLU-Pro (MCF)	19.3	12.7	24.2	11.7
BBH (3-shot)	32.2	27.6	35.3	25.7
GSM8K (5-shot)	48.2	26.8	42.8	4.62

示例

以下是一些适用于特殊任务的系统提示和指令提示示例

文本重写

system_prompt_rewrite = "You are an AI writing assistant. Your task is to rewrite the user's email to make it more professional and approachable while maintaining its main points and key message. Do not return any text other than the rewritten message."
user_prompt_rewrite = "Rewrite the message below to make it more friendly and approachable while maintaining its main points and key message. Do not add any new information or return any text other than the rewritten message\nThe message:"
messages = [{"role": "system", "content": system_prompt_rewrite}, {"role": "user", "content":f"{user_prompt_rewrite} The CI is failing after your last commit!"}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

Hey there! I noticed that the CI isn't passing after your latest commit. Could you take a look and let me know what's going on? Thanks so much for your help!

摘要

system_prompt_summarize = "Provide a concise, objective summary of the input text in up to three sentences, focusing on key actions and intentions without using second or third person pronouns."
messages = [{"role": "system", "content": system_prompt_summarize}, {"role": "user", "content": INSERT_LONG_EMAIL}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

函数调用

SmolLM2-1.7B-Instruct 可以处理函数调用，它在 BFCL 排行榜上的得分是 27%。以下是您如何利用它的方法：

import json
import re
from typing import Optional

from jinja2 import Template
import torch 
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.utils import get_json_schema


system_prompt = Template("""You are an expert in composing functions. You are given a question and a set of possible functions. 
Based on the question, you will need to make one or more function/tool calls to achieve the purpose. 
If none of the functions can be used, point it out and refuse to answer. 
If the given question lacks the parameters required by the function, also point it out.

You have access to the following tools:
<tools>{{ tools }}</tools>

The output MUST strictly adhere to the following format, and NO other text MUST be included.
The example format is as follows. Please make sure the parameter type is correct. If no function call is needed, please make the tool calls an empty list '[]'.
<tool_call>[
{"name": "func_name1", "arguments": {"argument1": "value1", "argument2": "value2"}},
... (more tool calls as required)
]</tool_call>""")


def prepare_messages(
    query: str,
    tools: Optional[dict[str, any]] = None,
    history: Optional[list[dict[str, str]]] = None
) -> list[dict[str, str]]:
    """Prepare the system and user messages for the given query and tools.
    
    Args:
        query: The query to be answered.
        tools: The tools available to the user. Defaults to None, in which case if a
            list without content will be passed to the model.
        history: Exchange of messages, including the system_prompt from
            the first query. Defaults to None, the first message in a conversation.
    """
    if tools is None:
        tools = []
    if history:
        messages = history.copy()
        messages.append({"role": "user", "content": query})
    else:
        messages = [
            {"role": "system", "content": system_prompt.render(tools=json.dumps(tools))},
            {"role": "user", "content": query}
        ]
    return messages


def parse_response(text: str) -> str | dict[str, any]:
    """Parses a response from the model, returning either the
    parsed list with the tool calls parsed, or the
    model thought or response if couldn't generate one.

    Args:
        text: Response from the model.
    """
    pattern = r"<tool_call>(.*?)</tool_call>"
    matches = re.findall(pattern, text, re.DOTALL)
    if matches:
        return json.loads(matches[0])
    return text


model_name_smollm = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name_smollm, device_map="auto", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_smollm)

from datetime import datetime
import random

def get_current_time() -> str:
    """Returns the current time in 24-hour format.

    Returns:
        str: Current time in HH:MM:SS format.
    """
    return datetime.now().strftime("%H:%M:%S")


def get_random_number_between(min: int, max: int) -> int:
    """
    Gets a random number between min and max.

    Args:
        min: The minimum number.
        max: The maximum number.

    Returns:
        A random number between min and max.
    """
    return random.randint(min, max)


tools = [get_json_schema(get_random_number_between), get_json_schema(get_current_time)]

toolbox = {"get_random_number_between": get_random_number_between, "get_current_time": get_current_time}

query = "Give me a number between 1 and 300"

messages = prepare_messages(query, tools=tools)

inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
result = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)

tool_calls = parse_response(result)
# [{'name': 'get_random_number_between', 'arguments': {'min': 1, 'max': 300}}

# 获取工具响应
tool_responses = [toolbox.get(tc["name"])(*tc["arguments"].values()) for tc in tool_calls]
# [63]

# 对于第二轮，重建消息历史：
history = messages.copy()
# 添加"解析后的响应"
history.append({"role": "assistant", "content": result})
query = "Can you give me the hour?"
history.append({"role": "user", "content": query})

inputs = tokenizer.apply_chat_template(history, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
result = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)

tool_calls = parse_response(result)
tool_responses = [toolbox.get(tc["name"])(*tc["arguments"].values()) for tc in tool_calls]
# ['07:57:25']

更多详细信息，如并行函数调用和不可用的工具，请参阅此处

局限性

SmolLM2 模型主要理解和生成英文内容。它们可以生成各种主题的文本，但生成的内容可能并不总是事实准确、逻辑一致或完全不受训练数据中存在的偏见影响。这些模型应作为辅助工具使用，而非权威信息来源。用户应始终验证重要信息并批判性地评估任何生成的内容。

训练

模型

架构： Transformer 解码器
预训练 token： 11T
精度： bfloat16

硬件

GPU： 256 个 H100

软件

训练框架： nanotron
对齐手册 alignment-handbook

许可证

Apache 2.0

引用

@misc{allal2025smollm2smolgoesbig,
      title={SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model}, 
      author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Gabriel Martín Blázquez and Guilherme Penedo and Lewis Tunstall and Andrés Marafioti and Hynek Kydlíček and Agustín Piqueres Lajarín and Vaibhav Srivastav and Joshua Lochner and Caleb Fahlgren and Xuan-Son Nguyen and Clémentine Fourrier and Ben Burtenshaw and Hugo Larcher and Haojun Zhao and Cyril Zakka and Mathieu Morlon and Colin Raffel and Leandro von Werra and Thomas Wolf},
      year={2025},
      eprint={2502.02737},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.02737}, 
}

HuggingFaceTB/SmolLM2-1.7B-Instruct

作者 HuggingFaceTB

text-generation transformers

↓ 91.8K ♥ 723

创建时间: 2024-10-31 13:42:06+00:00

更新时间: 2025-04-21 20:51:14+00:00

在 Hugging Face 上查看

文件 (28)

.gitattributes

README.md

all_results.json

config.json

eval_results.json

generation_config.json

instructions_function_calling.md

merges.txt

model.safetensors

onnx/model.onnx ONNX

onnx/model.onnx_data

onnx/model_bnb4.onnx ONNX

onnx/model_fp16.onnx ONNX

onnx/model_fp16.onnx_data

onnx/model_int8.onnx ONNX

onnx/model_q4.onnx ONNX

onnx/model_q4f16.onnx ONNX

onnx/model_quantized.onnx ONNX

onnx/model_uint8.onnx ONNX

runs/Oct31_06-24-59_ip-26-0-174-36/events.out.tfevents.1730356365.ip-26-0-174-36.3169719.0

runs/Oct31_06-24-59_ip-26-0-174-36/events.out.tfevents.1730363825.ip-26-0-174-36.3169719.1

special_tokens_map.json

tokenizer.json

tokenizer_config.json

train_results.json

trainer_state.json

training_args.bin

vocab.json