返回模型

说明文档

许可证

本项目采用 Apache License 2.0 许可证（SPDX-License-Identifier: Apache-2.0）。

我们在训练过程中使用了合规检查算法，尽最大努力确保训练模型的合规性。由于数据的复杂性和语言模型使用场景的多样性，我们无法保证模型完全不存在版权问题或不当内容。如果您认为任何内容侵犯了您的权利或产生了不当内容，请联系我们，我们将及时处理。

🗣️ Marco-Voice：多语言情感可控语音合成

Marco-Voice 是一个开源的文本转语音（TTS）框架，支持高保真语音克隆和合成语音的细粒度情感控制。通过集成先进的解耦技术，Marco-Voice 支持生成具有录音棚品质的表现力丰富的语音。

您可以在 Hugging Face Spaces 上试用，或将其集成到您的应用程序中，实现可控、自然的合成语音。

📌 模型详情

开发者： Marco-Voice 团队
模型类型： 具有说话人和情感调节的神经文本转语音（TTS）
支持的情感： 7 种（包括悲伤、惊讶、快乐等）
语音克隆： 零样本/少样本说话人自适应
架构兼容性： 兼容 CosyVoice 骨干网络

🎯 预期用途与能力

Marco-Voice 专为需要情感表现力和说话人一致性的合成语音应用而设计，例如：

具有个性化声音和情感语调的虚拟助手
具有动态韵律的有声读物朗读
游戏和动画语音合成
无障碍工具（例如：富有表现力的屏幕阅读器）
保留说话人身份的跨语言语音配音

⚠️ 限制与伦理考量

音色相似度与情感控制之间存在权衡。

🚀 快速开始

设置您的环境：

conda create -n marco python=3.8
conda activate marco
pip install -r requirements.txt


🎯 推理示例
```python
from Models.marco_voice.cosyvoice_rodis.cli.cosyvoice import CosyVoice
from Models.marco_voice.cosyvoice_emosphere.cli.cosyvoice import CosyVoice as cosy_emosphere
from Models.marco_voice.cosyvoice_rodis.utils.file_utils import load_wav
import torch
import torchaudio

# Load pre-trained models
model = CosyVoice('pretrained_models/v4', load_jit=False, load_onnx=False, fp16=False)
model_emosphere = cosy_emosphere('pretrained_models/v5', load_jit=False, load_onnx=False, fp16=False)

# Define emotion mapping
emo = {
    \"伤心\": \"Sad\",
    \"恐惧\": \"Fearful\",
    \"快乐\": \"Happy\",
    \"惊喜\": \"Surprise\",
    \"生气\": \"Angry\",
    \"戏谑\": \"Jolliest\"
}

# Load reference speech for voice cloning
prompt_speech_16k = load_wav(\"your_audio_path/exam.wav\", 16000)
emo_type = \"快乐\"

# Load emotion embedding based on speaker and emotion
if emo_type in [\"生气\", \"惊喜\", \"快乐\"]:
    emotion_info = torch.load(\"assets/emotion_info.pt\")[\"male005\"][emo.get(emo_type)]
elif emo_type in [\"伤心\"]:
    emotion_info = torch.load(\"assets/emotion_info.pt\")[\"female005\"][emo.get(emo_type)]
elif emo_type in [\"恐惧\"]:
    emotion_info = torch.load(\"assets/emotion_info.pt\")[\"female003\"][emo.get(emo_type)]
else:
    emotion_info = torch.load(\"assets/emotion_info.pt\")[\"male005\"][emo.get(emo_type)]

# 1. Discrete emotion control
for i, j in enumerate(model.synthesize(
    text=\"今天的天气真不错，我们出去散步吧！\",
    prompt_text=\"\",
    reference_speech=prompt_speech_16k,
    emo_type=emo_type,
    emotion_embedding=emotion_info
)):
    torchaudio.save(f'emotional_{emo_type}.wav', j['tts_speech'], 22050)

# 2. Continuous emotion control (Emosphere)
for i, j in enumerate(model_emosphere.synthesize(
    text=\"今天的天气真不错，我们出去散步吧！\",
    prompt_text=\"\",
    reference_speech=prompt_speech_16k,
    emotion_embedding=emotion_info,
    low_level_emo_embedding=[0.1, 0.4, 0.5]
)):
    torchaudio.save(f'emosphere_{emo_type}.wav', j['tts_speech'], 22050)

# 3. Cross-lingual emotion transfer
for i, j in enumerate(model.synthesize(
    text=\"hello, i'm a speech synthesis model, how are you today?\",
    prompt_text=\"\",
    reference_speech=prompt_speech_16k,
    emo_type=emo_type,
    emotion_embedding=emotion_info
)):
    torchaudio.save(f'cross_lingual_{emo_type}.wav', j['tts_speech'], 22050)
您也可以使用提供的脚本运行推理：

如需推理脚本、训练配方、文档等更多内容，请访问我们的 GitHub 仓库： 👉 https://github.com/AIDC-AI/Marco-Voice

AIDC-AI/Marco-Voice

作者 AIDC-AI

text-to-speech

↓ 0 ♥ 12

创建时间: 2025-11-20 03:14:50+00:00

更新时间: 2025-12-03 08:51:21+00:00

在 Hugging Face 上查看

文件 (28)

.gitattributes

LICENSE

NOTICE

README.md

marco_voice/.msc

marco_voice/.mv

marco_voice/campplus.onnx ONNX

marco_voice/configuration.json

marco_voice/cosyvoice.yaml

marco_voice/flow.decoder.estimator.fp32.onnx ONNX

marco_voice/flow.encoder.fp32.zip

marco_voice/flow.pt

marco_voice/hift.pt

marco_voice/llm.llm.fp16.zip

marco_voice/llm.pt

marco_voice/llm.text_encoder.fp16.zip

marco_voice/speech_tokenizer_v1.onnx ONNX

marco_voice/spk2info.pt

marco_voice_enhenced/campplus.onnx ONNX

marco_voice_enhenced/cosyvoice.yaml

marco_voice_enhenced/flow.decoder.estimator.fp32.onnx ONNX

marco_voice_enhenced/flow.encoder.fp32.zip

marco_voice_enhenced/flow.pt

marco_voice_enhenced/hift.pt

marco_voice_enhenced/llm.llm.fp16.zip

marco_voice_enhenced/llm.pt

marco_voice_enhenced/llm.text_encoder.fp16.zip

marco_voice_enhenced/speech_tokenizer_v1.onnx ONNX