ONNX 模型库
返回模型

说明文档

Kokoro TTS

Kokoro 是一个前沿的 TTS 模型,参数规模为 8200 万(文本输入/音频输出)。

目录

示例

人生就像一盒巧克力。你永远不知道会得到什么。

音色 国籍 性别 样本
默认 (af) 美国 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/C0_ZUcNSAxvMwpS8QbnKv.wav"></audio>
Bella (af_bella) 美国 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/B_q15Z_FXdgBP9-Hk9oKq.wav"></audio>
Nicole (af_nicole) 美国 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/sS8U5lQHkhgX7rwTmy-5w.wav"></audio>
Sarah (af_sarah) 美国 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/SokkBiqEqwxLLx_pqvf1p.wav"></audio>
Sky (af_sky) 美国 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/IzySGHUtl5mYeFxx1oaRf.wav"></audio>
Adam (am_adam) 美国 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/9n6myE6--ZsEuF5xDv5eC.wav"></audio>
Michael (am_michael) 美国 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/EPFciGtTU1YUXu8MAw7DX.wav"></audio>
Emma (bf_emma) 英国 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/AGEsXs-gyJq3dsyo7PjHo.wav"></audio>
Isabella (bf_isabella) 英国 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/JEzrrXYJSDcmlEzI7tE0c.wav"></audio>
George (bm_george) 英国 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/nsv4zKB4MX2TvXRxv504k.wav"></audio>
Lewis (bm_lewis) 英国 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/g_mcBl2xTbQl0sbrpZt48.wav"></audio>

使用方法

JavaScript

首先,使用以下命令从 NPM 安装 kokoro-js 库:

npm i kokoro-js

然后可以按以下方式生成语音:

import { KokoroTTS } from \"kokoro-js\";

const model_id = \"onnx-community/Kokoro-82M-ONNX\";
const tts = await KokoroTTS.from_pretrained(model_id, {
  dtype: \"q8\", // 选项: \"fp32\", \"fp16\", \"q8\", \"q4\", \"q4f16\"
});

const text = \"Life is like a box of chocolates. You never know what you're gonna get.\";
const audio = await tts.generate(text, {
  // 使用 `tts.list_voices()` 列出所有可用音色
  voice: \"af_bella\",
});
audio.save(\"audio.wav\");

Python

import os
import numpy as np
from onnxruntime import InferenceSession

# 由 kokoro.py 中的 phonemize() 和 tokenize() 生成的 token
tokens = [50, 157, 43, 135, 16, 53, 135, 46, 16, 43, 102, 16, 56, 156, 57, 135, 6, 16, 102, 62, 61, 16, 70, 56, 16, 138, 56, 156, 72, 56, 61, 85, 123, 83, 44, 83, 54, 16, 53, 65, 156, 86, 61, 62, 131, 83, 56, 4, 16, 54, 156, 43, 102, 53, 16, 156, 72, 61, 53, 102, 112, 16, 70, 56, 16, 138, 56, 44, 156, 76, 158, 123, 56, 16, 62, 131, 156, 43, 102, 54, 46, 16, 102, 48, 16, 81, 47, 102, 54, 16, 54, 156, 51, 158, 46, 16, 70, 16, 92, 156, 135, 46, 16, 54, 156, 43, 102, 48, 4, 16, 81, 47, 102, 16, 50, 156, 72, 64, 83, 56, 62, 16, 156, 51, 158, 64, 83, 56, 16, 44, 157, 102, 56, 16, 44, 156, 76, 158, 123, 56, 4]

# 上下文长度为 512,但需要在开头和结尾留出填充 token 0 的空间
assert len(tokens) <= 510, len(tokens)

# 基于 len(tokens) 的风格向量,ref_s 形状为 (1, 256)
voices = np.fromfile('./voices/af.bin', dtype=np.float32).reshape(-1, 1, 256)
ref_s = voices[len(tokens)]

# 添加填充 id,并重塑 tokens,形状应为 (1, <=512)
tokens = [[0, *tokens, 0]]

model_name = 'model.onnx' # 选项: model.onnx, model_fp16.onnx, model_quantized.onnx, model_q8f16.onnx, model_uint8.onnx, model_uint8f16.onnx, model_q4.onnx, model_q4f16.onnx
sess = InferenceSession(os.path.join('onnx', model_name))

audio = sess.run(None, dict(
    input_ids=tokens,
    style=ref_s,
    speed=np.ones(1, dtype=np.float32),
))[0]

可选:将音频保存到文件:

import scipy.io.wavfile as wavfile
wavfile.write('audio.wav', 24000, audio[0])

量化版本

该模型对量化具有很强的适应性,能够以原始模型大小的一小部分实现高效的高质量语音合成。

我怎么知道?这是一个无法回答的问题。就像问一个未出生的孩子他们是否会过上好生活。他们还没出生呢。

模型 大小 (MB) 样本
model.onnx (fp32) 326 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/njexBuqPzfYUvWgs9eQ-_.wav"></audio>
model_fp16.onnx (fp16) 163 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/8Ebl44hMQonZs4MlykExt.wav"></audio>
model_quantized.onnx (8-bit) 92.4 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/9SLOt6ETclZ4yRdlJ0VIj.wav"></audio>
model_q8f16.onnx (混合精度) 86 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/gNDMqb33YEmYMbAIv_Grx.wav"></audio>
model_uint8.onnx (8-bit & 混合精度) 177 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/tpOWRHIWwEb0PJX46dCWQ.wav"></audio>
model_uint8f16.onnx (混合精度) 114 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/vtZhABzjP0pvGD7dRb5Vr.wav"></audio>
model_q4.onnx (4-bit 矩阵乘法) 305 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/8FVn0IJIUfccEBWq8Fnw_.wav"></audio>
model_q4f16.onnx (4-bit 矩阵乘法 & fp16 权重) 154 <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/7DrgWC_1q00s-wUJuG44X.wav"></audio>

onnx-community/Kokoro-82M-ONNX

作者 onnx-community

text-to-speech transformers.js
↓ 39.9K ♥ 167

创建时间: 2025-01-12 01:48:29+00:00

更新时间: 2025-02-07 16:53:13+00:00

在 Hugging Face 上查看

文件 (24)

.gitattributes
README.md
config.json
onnx/model.onnx ONNX
onnx/model_fp16.onnx ONNX
onnx/model_q4.onnx ONNX
onnx/model_q4f16.onnx ONNX
onnx/model_q8f16.onnx ONNX
onnx/model_quantized.onnx ONNX
onnx/model_uint8.onnx ONNX
onnx/model_uint8f16.onnx ONNX
tokenizer.json
tokenizer_config.json
voices/af.bin
voices/af_bella.bin
voices/af_nicole.bin
voices/af_sarah.bin
voices/af_sky.bin
voices/am_adam.bin
voices/am_michael.bin
voices/bf_emma.bin
voices/bf_isabella.bin
voices/bm_george.bin
voices/bm_lewis.bin