返回模型
说明文档
许可证
本项目采用 Apache License 2.0 许可证(SPDX-License-Identifier: Apache-2.0)。
我们在训练过程中使用了合规检查算法,尽最大努力确保训练模型的合规性。由于数据的复杂性和语言模型使用场景的多样性,我们无法保证模型完全不存在版权问题或不当内容。如果您认为任何内容侵犯了您的权利或产生了不当内容,请联系我们,我们将及时处理。
🗣️ Marco-Voice:多语言情感可控语音合成
Marco-Voice 是一个开源的文本转语音(TTS)框架,支持高保真语音克隆和合成语音的细粒度情感控制。通过集成先进的解耦技术,Marco-Voice 支持生成具有录音棚品质的表现力丰富的语音。
您可以在 Hugging Face Spaces 上试用,或将其集成到您的应用程序中,实现可控、自然的合成语音。
📌 模型详情
- 开发者: Marco-Voice 团队
- 模型类型: 具有说话人和情感调节的神经文本转语音(TTS)
- 支持的情感: 7 种(包括悲伤、惊讶、快乐等)
- 语音克隆: 零样本/少样本说话人自适应
- 架构兼容性: 兼容 CosyVoice 骨干网络
🎯 预期用途与能力
Marco-Voice 专为需要情感表现力和说话人一致性的合成语音应用而设计,例如:
- 具有个性化声音和情感语调的虚拟助手
- 具有动态韵律的有声读物朗读
- 游戏和动画语音合成
- 无障碍工具(例如:富有表现力的屏幕阅读器)
- 保留说话人身份的跨语言语音配音
⚠️ 限制与伦理考量
音色相似度与情感控制之间存在权衡。
🚀 快速开始
设置您的环境:
conda create -n marco python=3.8
conda activate marco
pip install -r requirements.txt
🎯 推理示例
```python
from Models.marco_voice.cosyvoice_rodis.cli.cosyvoice import CosyVoice
from Models.marco_voice.cosyvoice_emosphere.cli.cosyvoice import CosyVoice as cosy_emosphere
from Models.marco_voice.cosyvoice_rodis.utils.file_utils import load_wav
import torch
import torchaudio
# Load pre-trained models
model = CosyVoice('pretrained_models/v4', load_jit=False, load_onnx=False, fp16=False)
model_emosphere = cosy_emosphere('pretrained_models/v5', load_jit=False, load_onnx=False, fp16=False)
# Define emotion mapping
emo = {
\"伤心\": \"Sad\",
\"恐惧\": \"Fearful\",
\"快乐\": \"Happy\",
\"惊喜\": \"Surprise\",
\"生气\": \"Angry\",
\"戏谑\": \"Jolliest\"
}
# Load reference speech for voice cloning
prompt_speech_16k = load_wav(\"your_audio_path/exam.wav\", 16000)
emo_type = \"快乐\"
# Load emotion embedding based on speaker and emotion
if emo_type in [\"生气\", \"惊喜\", \"快乐\"]:
emotion_info = torch.load(\"assets/emotion_info.pt\")[\"male005\"][emo.get(emo_type)]
elif emo_type in [\"伤心\"]:
emotion_info = torch.load(\"assets/emotion_info.pt\")[\"female005\"][emo.get(emo_type)]
elif emo_type in [\"恐惧\"]:
emotion_info = torch.load(\"assets/emotion_info.pt\")[\"female003\"][emo.get(emo_type)]
else:
emotion_info = torch.load(\"assets/emotion_info.pt\")[\"male005\"][emo.get(emo_type)]
# 1. Discrete emotion control
for i, j in enumerate(model.synthesize(
text=\"今天的天气真不错,我们出去散步吧!\",
prompt_text=\"\",
reference_speech=prompt_speech_16k,
emo_type=emo_type,
emotion_embedding=emotion_info
)):
torchaudio.save(f'emotional_{emo_type}.wav', j['tts_speech'], 22050)
# 2. Continuous emotion control (Emosphere)
for i, j in enumerate(model_emosphere.synthesize(
text=\"今天的天气真不错,我们出去散步吧!\",
prompt_text=\"\",
reference_speech=prompt_speech_16k,
emotion_embedding=emotion_info,
low_level_emo_embedding=[0.1, 0.4, 0.5]
)):
torchaudio.save(f'emosphere_{emo_type}.wav', j['tts_speech'], 22050)
# 3. Cross-lingual emotion transfer
for i, j in enumerate(model.synthesize(
text=\"hello, i'm a speech synthesis model, how are you today?\",
prompt_text=\"\",
reference_speech=prompt_speech_16k,
emo_type=emo_type,
emotion_embedding=emotion_info
)):
torchaudio.save(f'cross_lingual_{emo_type}.wav', j['tts_speech'], 22050)
您也可以使用提供的脚本运行推理:
如需推理脚本、训练配方、文档等更多内容,请访问我们的 GitHub 仓库: 👉 https://github.com/AIDC-AI/Marco-Voice
AIDC-AI/Marco-Voice
作者 AIDC-AI
text-to-speech
↓ 0
♥ 12
创建时间: 2025-11-20 03:14:50+00:00
更新时间: 2025-12-03 08:51:21+00:00
在 Hugging Face 上查看文件 (28)
.gitattributes
LICENSE
NOTICE
README.md
marco_voice/.msc
marco_voice/.mv
marco_voice/campplus.onnx
ONNX
marco_voice/configuration.json
marco_voice/cosyvoice.yaml
marco_voice/flow.decoder.estimator.fp32.onnx
ONNX
marco_voice/flow.encoder.fp32.zip
marco_voice/flow.pt
marco_voice/hift.pt
marco_voice/llm.llm.fp16.zip
marco_voice/llm.pt
marco_voice/llm.text_encoder.fp16.zip
marco_voice/speech_tokenizer_v1.onnx
ONNX
marco_voice/spk2info.pt
marco_voice_enhenced/campplus.onnx
ONNX
marco_voice_enhenced/cosyvoice.yaml
marco_voice_enhenced/flow.decoder.estimator.fp32.onnx
ONNX
marco_voice_enhenced/flow.encoder.fp32.zip
marco_voice_enhenced/flow.pt
marco_voice_enhenced/hift.pt
marco_voice_enhenced/llm.llm.fp16.zip
marco_voice_enhenced/llm.pt
marco_voice_enhenced/llm.text_encoder.fp16.zip
marco_voice_enhenced/speech_tokenizer_v1.onnx
ONNX