返回模型
说明文档
Kokoro TTS
Kokoro 是一个前沿的 TTS 模型,参数规模为 8200 万(文本输入/音频输出)。
目录
示例
人生就像一盒巧克力。你永远不知道会得到什么。
| 音色 | 国籍 | 性别 | 样本 |
|---|---|---|---|
默认 (af) |
美国 | 女 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/C0_ZUcNSAxvMwpS8QbnKv.wav"></audio> |
Bella (af_bella) |
美国 | 女 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/B_q15Z_FXdgBP9-Hk9oKq.wav"></audio> |
Nicole (af_nicole) |
美国 | 女 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/sS8U5lQHkhgX7rwTmy-5w.wav"></audio> |
Sarah (af_sarah) |
美国 | 女 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/SokkBiqEqwxLLx_pqvf1p.wav"></audio> |
Sky (af_sky) |
美国 | 女 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/IzySGHUtl5mYeFxx1oaRf.wav"></audio> |
Adam (am_adam) |
美国 | 男 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/9n6myE6--ZsEuF5xDv5eC.wav"></audio> |
Michael (am_michael) |
美国 | 男 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/EPFciGtTU1YUXu8MAw7DX.wav"></audio> |
Emma (bf_emma) |
英国 | 女 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/AGEsXs-gyJq3dsyo7PjHo.wav"></audio> |
Isabella (bf_isabella) |
英国 | 女 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/JEzrrXYJSDcmlEzI7tE0c.wav"></audio> |
George (bm_george) |
英国 | 男 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/nsv4zKB4MX2TvXRxv504k.wav"></audio> |
Lewis (bm_lewis) |
英国 | 男 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/g_mcBl2xTbQl0sbrpZt48.wav"></audio> |
使用方法
JavaScript
首先,使用以下命令从 NPM 安装 kokoro-js 库:
npm i kokoro-js
然后可以按以下方式生成语音:
import { KokoroTTS } from \"kokoro-js\";
const model_id = \"onnx-community/Kokoro-82M-ONNX\";
const tts = await KokoroTTS.from_pretrained(model_id, {
dtype: \"q8\", // 选项: \"fp32\", \"fp16\", \"q8\", \"q4\", \"q4f16\"
});
const text = \"Life is like a box of chocolates. You never know what you're gonna get.\";
const audio = await tts.generate(text, {
// 使用 `tts.list_voices()` 列出所有可用音色
voice: \"af_bella\",
});
audio.save(\"audio.wav\");
Python
import os
import numpy as np
from onnxruntime import InferenceSession
# 由 kokoro.py 中的 phonemize() 和 tokenize() 生成的 token
tokens = [50, 157, 43, 135, 16, 53, 135, 46, 16, 43, 102, 16, 56, 156, 57, 135, 6, 16, 102, 62, 61, 16, 70, 56, 16, 138, 56, 156, 72, 56, 61, 85, 123, 83, 44, 83, 54, 16, 53, 65, 156, 86, 61, 62, 131, 83, 56, 4, 16, 54, 156, 43, 102, 53, 16, 156, 72, 61, 53, 102, 112, 16, 70, 56, 16, 138, 56, 44, 156, 76, 158, 123, 56, 16, 62, 131, 156, 43, 102, 54, 46, 16, 102, 48, 16, 81, 47, 102, 54, 16, 54, 156, 51, 158, 46, 16, 70, 16, 92, 156, 135, 46, 16, 54, 156, 43, 102, 48, 4, 16, 81, 47, 102, 16, 50, 156, 72, 64, 83, 56, 62, 16, 156, 51, 158, 64, 83, 56, 16, 44, 157, 102, 56, 16, 44, 156, 76, 158, 123, 56, 4]
# 上下文长度为 512,但需要在开头和结尾留出填充 token 0 的空间
assert len(tokens) <= 510, len(tokens)
# 基于 len(tokens) 的风格向量,ref_s 形状为 (1, 256)
voices = np.fromfile('./voices/af.bin', dtype=np.float32).reshape(-1, 1, 256)
ref_s = voices[len(tokens)]
# 添加填充 id,并重塑 tokens,形状应为 (1, <=512)
tokens = [[0, *tokens, 0]]
model_name = 'model.onnx' # 选项: model.onnx, model_fp16.onnx, model_quantized.onnx, model_q8f16.onnx, model_uint8.onnx, model_uint8f16.onnx, model_q4.onnx, model_q4f16.onnx
sess = InferenceSession(os.path.join('onnx', model_name))
audio = sess.run(None, dict(
input_ids=tokens,
style=ref_s,
speed=np.ones(1, dtype=np.float32),
))[0]
可选:将音频保存到文件:
import scipy.io.wavfile as wavfile
wavfile.write('audio.wav', 24000, audio[0])
量化版本
该模型对量化具有很强的适应性,能够以原始模型大小的一小部分实现高效的高质量语音合成。
我怎么知道?这是一个无法回答的问题。就像问一个未出生的孩子他们是否会过上好生活。他们还没出生呢。
| 模型 | 大小 (MB) | 样本 |
|---|---|---|
| model.onnx (fp32) | 326 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/njexBuqPzfYUvWgs9eQ-_.wav"></audio> |
| model_fp16.onnx (fp16) | 163 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/8Ebl44hMQonZs4MlykExt.wav"></audio> |
| model_quantized.onnx (8-bit) | 92.4 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/9SLOt6ETclZ4yRdlJ0VIj.wav"></audio> |
| model_q8f16.onnx (混合精度) | 86 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/gNDMqb33YEmYMbAIv_Grx.wav"></audio> |
| model_uint8.onnx (8-bit & 混合精度) | 177 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/tpOWRHIWwEb0PJX46dCWQ.wav"></audio> |
| model_uint8f16.onnx (混合精度) | 114 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/vtZhABzjP0pvGD7dRb5Vr.wav"></audio> |
| model_q4.onnx (4-bit 矩阵乘法) | 305 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/8FVn0IJIUfccEBWq8Fnw_.wav"></audio> |
| model_q4f16.onnx (4-bit 矩阵乘法 & fp16 权重) | 154 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/7DrgWC_1q00s-wUJuG44X.wav"></audio> |
onnx-community/Kokoro-82M-ONNX
作者 onnx-community
text-to-speech
transformers.js
↓ 39.9K
♥ 167
创建时间: 2025-01-12 01:48:29+00:00
更新时间: 2025-02-07 16:53:13+00:00
在 Hugging Face 上查看文件 (24)
.gitattributes
README.md
config.json
onnx/model.onnx
ONNX
onnx/model_fp16.onnx
ONNX
onnx/model_q4.onnx
ONNX
onnx/model_q4f16.onnx
ONNX
onnx/model_q8f16.onnx
ONNX
onnx/model_quantized.onnx
ONNX
onnx/model_uint8.onnx
ONNX
onnx/model_uint8f16.onnx
ONNX
tokenizer.json
tokenizer_config.json
voices/af.bin
voices/af_bella.bin
voices/af_nicole.bin
voices/af_sarah.bin
voices/af_sky.bin
voices/am_adam.bin
voices/am_michael.bin
voices/bf_emma.bin
voices/bf_isabella.bin
voices/bm_george.bin
voices/bm_lewis.bin