说明文档

OpenReasoning-Nemotron-1.5B 概述

描述:

OpenReasoning-Nemotron-1.5B 是一个大型语言模型 (LLM),是 Qwen2.5-1.5B-Instruct(又称参考模型)的衍生版本。它是一个经过后训练的推理模型,专门用于数学、代码和科学解决方案生成的推理任务。我们使用高达 64K 输出词元对该模型进行了评估。OpenReasoning 模型提供以下规格:1.5B、7B、14B 和 32B。

该模型可用于商业/非商业研究用途。

许可证/使用条款:

管理条款:上述模型的使用受知识共享署名 4.0 国际许可协议 (CC-BY-4.0) 管辖。附加信息:Apache 2.0 许可证

推理基准测试得分

Evaluation Results with pass@1

我们的模型在一系列具有挑战性的推理基准测试中展现出卓越的性能。7B、14B 和 32B 模型在其各自规格类别中持续创造新的最先进记录。

模型	AritificalAnalysisIndex*	GPQA	MMLU-PRO	HLE	LiveCodeBench*	SciCode	AIME24	AIME25	HMMT FEB 25
1.5B	31.0	31.6	47.5	5.5	28.6	2.2	55.5	45.6	31.5
7B	54.7	61.1	71.9	8.3	63.3	16.2	84.7	78.2	63.5
14B	60.9	71.6	77.5	10.1	67.8	23.5	87.8	82.0	71.2
32B	64.3	73.1	80.0	11.9	70.2	28.5	89.2	84.0	73.8

* 这是我们对 Artificial Analysis Intelligence Index 的估算,并非官方分数。

* LiveCodeBench 版本 6,日期范围 2408-2505。

结合多个智能体的工作

OpenReasoning-Nemotron 模型可以通过启动多个并行生成并通过生成式解决方案选择 (GenSelect) 将它们组合在一起,以"重型"模式使用。为了添加这一"技能",我们遵循原始的 GenSelect 训练流程,但我们不训练选择摘要,而是使用 DeepSeek R1 0528 671B 的完整推理轨迹。我们只训练模型为数学问题选择最佳解决方案,但令人惊讶的是,这种能力直接泛化到了代码和科学问题!在这种"重型" GenSelect 推理模式下,OpenReasoning-Nemotron-32B 模型在数学和编码基准测试中超越了 O3 (High)。

Evaluation Results with GenSelect

模型	Pass@1 (Avg@64)	Majority@64	GenSelect
1.5B
AIME24	55.5	76.7	76.7
AIME25	45.6	70.0	70.0
HMMT Feb 25	31.5	46.7	53.3
7B
AIME24	84.7	93.3	93.3
AIME25	78.2	86.7	93.3
HMMT Feb 25	63.5	83.3	90.0
LCB v6 2408-2505	63.4	n/a	67.7
14B
AIME24	87.8	93.3	93.3
AIME25	82.0	90.0	90.0
HMMT Feb 25	71.2	86.7	93.3
LCB v6 2408-2505	67.9	n/a	69.1
32B
AIME24	89.2	93.3	93.3
AIME25	84.0	90.0	93.3
HMMT Feb 25	73.8	86.7	96.7
LCB v6 2408-2505	70.2	n/a	75.3
HLE	11.8	13.4	15.5

如何使用这些模型?

在编程问题上运行推理:

import transformers
import torch
model_id = "nvidia/OpenReasoning-Nemotron-1.5B"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

# 代码生成提示词
prompt = """You are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.
Please use python programming language only.
You must use ```python for just the final solution code block with the following format:
```python
# Your code here
```
{user}
"""

# 数学生成提示词
# prompt = """Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}.
# 
# {user}
# """

# 科学生成提示词
# 您可以参考此处的提示词 -
# https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/prompt/config/generic/hle.yaml (HLE)
# https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/prompt/config/eval/aai/mcq-4choices-boxed.yaml (用于 GPQA)
# https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/prompt/config/eval/aai/mcq-10choices-boxed.yaml (MMLU-Pro)

messages = [
    {
        "role": "user",
        "content": prompt.format(user="Write a program to calculate the sum of the first $N$ fibonacci numbers")},
]
outputs = pipeline(
    messages,
    max_new_tokens=64000,
)
print(outputs[0]["generated_text"][-1]['content'])

我们在本仓库中添加了一个简单的基于 transformer 的脚本来说明 GenSelect。要了解如何在 GenSelect 模式下使用 NeMo-Skills 使用这些模型,请参阅我们的文档。

要在 GenSelect 推理中使用该模型,我们建议遵循我们的 NeMo-Skills 参考实现。或者,您可以手动从所有解决方案中提取摘要,并将此提示词用于数学问题。我们将很快添加用于编程问题的提示词和参考实现!

您可以在以下论文中了解更多关于 GenSelect 的信息:

访问训练数据

训练数据已发布!数学和代码数据作为 Nemotron-Post-Training-Dataset-v1 的一部分提供,科学数据可在 OpenScienceReasoning-2 中获取。有关更多详细信息,请参阅我们的文档。

引用

如果您发现该数据有用,请引用:

@article{ahmad2025opencodereasoning,
      title={{OpenCodeReasoning: Advancing Data Distillation for Competitive Coding}}, 
      author={Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar, Aleksander Ficek, Siddhartha Jain, Jocelyn Huang, Vahid Noroozi, Boris Ginsburg},
      year={2025},
      eprint={2504.01943},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.01943}, 
}

@misc{ahmad2025opencodereasoningiisimpletesttime,
      title={{OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique}}, 
      author={Wasi Uddin Ahmad and Somshubra Majumdar and Aleksander Ficek and Sean Narenthiran and Mehrzad Samadi and Jocelyn Huang and Siddhartha Jain and Vahid Noroozi and Boris Ginsburg},
      year={2025},
      eprint={2507.09075},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.09075}, 
}

@misc{moshkov2025aimo2winningsolutionbuilding,
      title={{AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset}}, 
      author={Ivan Moshkov and Darragh Hanley and Ivan Sorokin and Shubham Toshniwal and Christof Henkel and Benedikt Schifferer and Wei Du and Igor Gitman},
      year={2025},
      eprint={2504.16891},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2504.16891}, 
}

@inproceedings{toshniwal2025genselect,
      title={{GenSelect: A Generative Approach to Best-of-N}},
      author={Shubham Toshniwal and Ivan Sorokin and Aleksander Ficek and Ivan Moshkov and Igor Gitman},
      booktitle={2nd AI for Math Workshop @ ICML 2025},
      year={2025},
      url={https://openreview.net/forum?id=8LhnmNmUDb}
}

附加信息:

部署地理位置:

全球

使用场景:

该模型面向从事竞争性数学、代码和科学问题的开发人员和研究人员。它仅通过监督微调进行训练,以在基准测试中取得优异成绩。

发布日期:

Huggingface [07/16/2025] 通过 https://huggingface.co/nvidia/OpenReasoning-Nemotron-1.5B/

参考资料:

[2504.01943] OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
[2504.01943] OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
[2504.16891] AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset

模型架构:

架构类型: 稠密仅解码器 Transformer 模型网络架构: Qwen-1.5B-Instruct **该模型基于 Qwen2.5-1.5B-Instruct 开发,具有 1.5B 模型参数。

OpenReasoning-Nemotron-1.5B 基于 Qwen2.5-1.5B-Instruct 开发,具有 1.5B 模型参数。 

OpenReasoning-Nemotron-7B 基于 Qwen2.5-7B-Instruct 开发,具有 7B 模型参数。 

OpenReasoning-Nemotron-14B 基于 Qwen2.5-14B-Instruct 开发,具有 14B 模型参数。 

OpenReasoning-Nemotron-32B 基于 Qwen2.5-32B-Instruct 开发,具有 32B 模型参数。 

输入:

输入类型: 文本 输入格式: 字符串 输入参数: 一维 (1D) 与输入相关的其他属性: 训练支持高达 64,000 个输出词元

输出:

输出类型: 文本 输出格式: 字符串 输出参数: 一维 (1D) 与输出相关的其他属性: 训练支持高达 64,000 个输出词元

我们的 AI 模型设计和/或优化为在 NVIDIA GPU 加速系统上运行。通过利用 NVIDIA 的硬件(例如 GPU 核心)和软件框架(例如 CUDA 库),与仅使用 CPU 的解决方案相比,该模型实现了更快的训练和推理时间。

软件集成:

运行时引擎: NeMo 2.3.0
推荐的硬件微架构兼容性: NVIDIA Ampere NVIDIA Hopper
首选/支持的操作系统: Linux

模型版本:

1.0 (7/16/2025) OpenReasoning-Nemotron-32B OpenReasoning-Nemotron-14B OpenReasoning-Nemotron-7B OpenReasoning-Nemotron-1.5B

训练和评估数据集:

训练数据集:

OpenReasoning-Nemotron-1.5B 的训练语料库包含来自 OpenCodeReasoning 数据集、OpenCodeReasoning-II、OpenMathReasoning 的问题,以及来自 Llama-Nemotron-Post-Training-Dataset 的合成科学问题。所有响应均使用 DeepSeek-R1-0528 生成。我们还包含了来自 Llama-Nemotron-Post-Training-Dataset 的指令遵循和工具调用数据,未经修改。

数据收集方法: 混合: 自动化、人工、合成 标注方法: 混合: 自动化、人工、合成 属性: 来自 OpenCodeReasoning 问题 (https://huggingface.co/datasets/nvidia/OpenCodeReasoning)、OpenMathReasoning 的 500 万个 DeepSeek-R1-0528 生成的响应,以及来自 Llama-Nemotron-Post-Training-Dataset 的合成科学问题。我们还包含了来自 Llama-Nemotron-Post-Training-Dataset 的指令遵循和工具调用数据,未经修改。

评估数据集:

我们使用以下基准测试对模型进行了全面评估。

数学

AIME 2024/2025
HMMT 2025年2月

代码

LiveCodeBench
SciCode

科学

GPQA
MMLU-PRO
HLE

数据收集方法: 混合: 自动化、人工、合成 标注方法: 混合: 自动化、人工、合成

推理:

加速引擎: vLLM, Tensor(RT)-LLM 测试硬件 NVIDIA H100-80GB

伦理考量:

NVIDIA 认为可信赖的 AI 是一项共同责任,我们已制定政策和实践来支持广泛的 AI 应用开发。在下载或根据我们的服务条款使用时,开发人员应与其内部模型团队合作,确保该模型满足相关行业和用例的要求,并解决不可预见的产品滥用问题。

有关此模型伦理考量的更详细信息,请参阅 Model Card++ 的可解释性、偏见、安全与隐私子卡片。

请在此处报告模型质量、风险、安全漏洞或 NVIDIA AI 相关问题。

onnx-community/OpenReasoning-Nemotron-1.5B

作者 onnx-community

text-generation onnxruntime-genai

↓ 0 ♥ 0

创建时间: 2025-08-04 13:06:29+00:00

更新时间: 2025-08-04 13:06:55+00:00

在 Hugging Face 上查看

文件 (17)

.gitattributes

BIAS.md

EXPLAINABILITY.md

PRIVACY.md

README.md

SAFETY.md

added_tokens.json

chat_template.jinja

genai_config.json

genselect_hf.py

merges.txt

model.onnx ONNX

model.onnx.data

special_tokens_map.json

tokenizer.json

tokenizer_config.json

vocab.json

说明文档

OpenReasoning-Nemotron-1.5B 概述

描述: <br>

许可证/使用条款: <br>

推理基准测试得分

结合多个智能体的工作

如何使用这些模型?

访问训练数据

引用

附加信息:

部署地理位置:

使用场景: <br>

发布日期: <br>

参考资料:

模型架构: <br>

输入: <br>

输出: <br>

软件集成: <br>