说明文档

Step-Audio-EditX

Vantage-Step-Audio-EditX ComfyUI 节点所需文件

原始模型链接： https://huggingface.co/stepfun-ai/Step-Audio-EditX

在 YouTube 上关注我们： @VantageWithAI

下载模型后，将它们复制到 ComfyUI/models 中，你应该具有以下结构：

ComfyUI/
├── models/
│   ├── Step-Audio-EditX/
│   ├──── CosyVoice-300M-25Hz/
│   │     ├─── campplus.onnx
│   │     ├─── cosyvoice.yaml
│   │     ├─── flow.pt
│   │     └─── hift.pt
│   ├──── dengcunqin/
│   ├──── └─── speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/
│   │          ├─── am.mvn
│   │          ├─── config.yaml
│   │          ├─── configuration.json
│   │          ├─── model.pt
│   │          ├─── seg_dict
│   │          ├─── tokens.json
│   │          ├─── tokens.txt
│   │          └─── write_tokens_from_txt.py
│   ├── model.safetensors
│   └── speech_tokenizer_v1.onnx

功能特性

零样本语音合成（Zero-Shot TTS）
- 优秀的普通话、英语、四川话和粤语零样本语音克隆。
- 要使用方言，只需在文本前添加 [Sichuanese] 或 [Cantonese] 标签。
情感和说话风格编辑
- 对情感和风格具有显著效果的迭代控制，支持数十种编辑选项。
  - 情感编辑：[ Angry（愤怒）、Happy（开心）、Sad（悲伤）、Excited（兴奋）、Fearful（恐惧）、Surprised（惊讶）、Disgusted（厌恶）等 ]
  - 说话风格编辑：[ Act_coy（撒娇）、Older（年长）、Child（儿童）、Whisper（耳语）、Serious（严肃）、Generous（大方）、Exaggerated（夸张）等]
  - 更多情感和说话风格编辑功能即将推出。敬请期待！ 🚀
副语言编辑：
- 对 10 种副语言特征进行精确控制，实现更自然、更像人类、更具表现力的合成音频。
- 支持的标签：
  - [ Breathing（呼吸）、Laughter（笑声）、Suprise-oh（惊讶-哦）、Confirmation-en（确认-嗯）、Uhm（嗯）、Suprise-ah（惊讶-啊）、Suprise-wa（惊讶-哇）、Sigh（叹气）、Question-ei（疑问-诶）、Dissatisfaction-hnn（不满-哼） ]

更多示例，请参见演示页面。

引用

@misc{yan2025stepaudioeditxtechnicalreport,
      title={Step-Audio-EditX Technical Report}, 
      author={Chao Yan and Boyong Wu and Peng Yang and Pengfei Tan and Guoqiang Hu and Yuxin Zhang and Xiangyu and Zhang and Fei Tian and Xuerui Yang and Xiangyu Zhang and Daxin Jiang and Gang Yu},
      year={2025},
      eprint={2511.03601},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2511.03601}, 
}

vantagewithai/Step-Fun-EditX-ComfyUI

作者 vantagewithai

text-to-speech transformers

↓ 0 ♥ 2

创建时间: 2025-11-25 13:02:17+00:00

更新时间: 2025-11-26 19:16:25+00:00

在 Hugging Face 上查看

文件 (21)

.gitattributes

README.md

Step-Audio-EditX/.gitattributes

Step-Audio-EditX/CosyVoice-300M-25Hz/campplus.onnx ONNX

Step-Audio-EditX/CosyVoice-300M-25Hz/cosyvoice.yaml

Step-Audio-EditX/CosyVoice-300M-25Hz/flow.pt

Step-Audio-EditX/CosyVoice-300M-25Hz/hift.pt

Step-Audio-EditX/dengcunqin/speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/.mdl

Step-Audio-EditX/dengcunqin/speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/.msc

Step-Audio-EditX/dengcunqin/speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/.mv

Step-Audio-EditX/dengcunqin/speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/am.mvn

Step-Audio-EditX/dengcunqin/speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/config.yaml

Step-Audio-EditX/dengcunqin/speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/configuration.json

Step-Audio-EditX/dengcunqin/speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/model.pt

Step-Audio-EditX/dengcunqin/speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/seg_dict

Step-Audio-EditX/dengcunqin/speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/tokens.json

Step-Audio-EditX/dengcunqin/speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/tokens.txt

Step-Audio-EditX/dengcunqin/speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/write_tokens_from_txt.py

Step-Audio-EditX/model.safetensors

Step-Audio-EditX/speech_tokenizer_v1.onnx ONNX

Vantage-EditX-Multi-Person-Unlimited.json