ONNX 模型库
返回模型

说明文档

许可证

本项目采用 Apache License 2.0 许可证(SPDX-License-Identifier: Apache-2.0)。

我们在训练过程中使用了合规检查算法,尽最大努力确保训练模型的合规性。由于数据的复杂性和语言模型使用场景的多样性,我们无法保证模型完全不存在版权问题或不当内容。如果您认为任何内容侵犯了您的权利或产生了不当内容,请联系我们,我们将及时处理。

🗣️ Marco-Voice:多语言情感可控语音合成

License

Marco-Voice 是一个开源的文本转语音(TTS)框架,支持高保真语音克隆和合成语音的细粒度情感控制。通过集成先进的解耦技术,Marco-Voice 支持生成具有录音棚品质的表现力丰富的语音。

您可以在 Hugging Face Spaces 上试用,或将其集成到您的应用程序中,实现可控、自然的合成语音。

📌 模型详情

  • 开发者: Marco-Voice 团队
  • 模型类型: 具有说话人和情感调节的神经文本转语音(TTS)
  • 支持的情感: 7 种(包括悲伤、惊讶、快乐等)
  • 语音克隆: 零样本/少样本说话人自适应
  • 架构兼容性: 兼容 CosyVoice 骨干网络

🎯 预期用途与能力

Marco-Voice 专为需要情感表现力和说话人一致性的合成语音应用而设计,例如:

  • 具有个性化声音和情感语调的虚拟助手
  • 具有动态韵律的有声读物朗读
  • 游戏和动画语音合成
  • 无障碍工具(例如:富有表现力的屏幕阅读器)
  • 保留说话人身份的跨语言语音配音

⚠️ 限制与伦理考量

音色相似度与情感控制之间存在权衡。

🚀 快速开始

设置您的环境:

conda create -n marco python=3.8
conda activate marco
pip install -r requirements.txt


🎯 推理示例
```python
from Models.marco_voice.cosyvoice_rodis.cli.cosyvoice import CosyVoice
from Models.marco_voice.cosyvoice_emosphere.cli.cosyvoice import CosyVoice as cosy_emosphere
from Models.marco_voice.cosyvoice_rodis.utils.file_utils import load_wav
import torch
import torchaudio

# Load pre-trained models
model = CosyVoice('pretrained_models/v4', load_jit=False, load_onnx=False, fp16=False)
model_emosphere = cosy_emosphere('pretrained_models/v5', load_jit=False, load_onnx=False, fp16=False)

# Define emotion mapping
emo = {
    \"伤心\": \"Sad\",
    \"恐惧\": \"Fearful\",
    \"快乐\": \"Happy\",
    \"惊喜\": \"Surprise\",
    \"生气\": \"Angry\",
    \"戏谑\": \"Jolliest\"
}

# Load reference speech for voice cloning
prompt_speech_16k = load_wav(\"your_audio_path/exam.wav\", 16000)
emo_type = \"快乐\"

# Load emotion embedding based on speaker and emotion
if emo_type in [\"生气\", \"惊喜\", \"快乐\"]:
    emotion_info = torch.load(\"assets/emotion_info.pt\")[\"male005\"][emo.get(emo_type)]
elif emo_type in [\"伤心\"]:
    emotion_info = torch.load(\"assets/emotion_info.pt\")[\"female005\"][emo.get(emo_type)]
elif emo_type in [\"恐惧\"]:
    emotion_info = torch.load(\"assets/emotion_info.pt\")[\"female003\"][emo.get(emo_type)]
else:
    emotion_info = torch.load(\"assets/emotion_info.pt\")[\"male005\"][emo.get(emo_type)]

# 1. Discrete emotion control
for i, j in enumerate(model.synthesize(
    text=\"今天的天气真不错,我们出去散步吧!\",
    prompt_text=\"\",
    reference_speech=prompt_speech_16k,
    emo_type=emo_type,
    emotion_embedding=emotion_info
)):
    torchaudio.save(f'emotional_{emo_type}.wav', j['tts_speech'], 22050)

# 2. Continuous emotion control (Emosphere)
for i, j in enumerate(model_emosphere.synthesize(
    text=\"今天的天气真不错,我们出去散步吧!\",
    prompt_text=\"\",
    reference_speech=prompt_speech_16k,
    emotion_embedding=emotion_info,
    low_level_emo_embedding=[0.1, 0.4, 0.5]
)):
    torchaudio.save(f'emosphere_{emo_type}.wav', j['tts_speech'], 22050)

# 3. Cross-lingual emotion transfer
for i, j in enumerate(model.synthesize(
    text=\"hello, i'm a speech synthesis model, how are you today?\",
    prompt_text=\"\",
    reference_speech=prompt_speech_16k,
    emo_type=emo_type,
    emotion_embedding=emotion_info
)):
    torchaudio.save(f'cross_lingual_{emo_type}.wav', j['tts_speech'], 22050)
您也可以使用提供的脚本运行推理:

如需推理脚本、训练配方、文档等更多内容,请访问我们的 GitHub 仓库: 👉 https://github.com/AIDC-AI/Marco-Voice

AIDC-AI/Marco-Voice

作者 AIDC-AI

text-to-speech
↓ 0 ♥ 12

创建时间: 2025-11-20 03:14:50+00:00

更新时间: 2025-12-03 08:51:21+00:00

在 Hugging Face 上查看

文件 (28)

.gitattributes
LICENSE
NOTICE
README.md
marco_voice/.msc
marco_voice/.mv
marco_voice/campplus.onnx ONNX
marco_voice/configuration.json
marco_voice/cosyvoice.yaml
marco_voice/flow.decoder.estimator.fp32.onnx ONNX
marco_voice/flow.encoder.fp32.zip
marco_voice/flow.pt
marco_voice/hift.pt
marco_voice/llm.llm.fp16.zip
marco_voice/llm.pt
marco_voice/llm.text_encoder.fp16.zip
marco_voice/speech_tokenizer_v1.onnx ONNX
marco_voice/spk2info.pt
marco_voice_enhenced/campplus.onnx ONNX
marco_voice_enhenced/cosyvoice.yaml
marco_voice_enhenced/flow.decoder.estimator.fp32.onnx ONNX
marco_voice_enhenced/flow.encoder.fp32.zip
marco_voice_enhenced/flow.pt
marco_voice_enhenced/hift.pt
marco_voice_enhenced/llm.llm.fp16.zip
marco_voice_enhenced/llm.pt
marco_voice_enhenced/llm.text_encoder.fp16.zip
marco_voice_enhenced/speech_tokenizer_v1.onnx ONNX