返回模型

说明文档

license: mit language:

en
zh base_model:
CosyVoice3 pipeline_tag: text-to-speech library_name: transformers tags:
CosyVoice3
Speech

AXERA-TECH/CosyVoice3

作者 AXERA-TECH

text-to-speech transformers

↓ 4 ♥ 0

创建时间: 2026-01-09 02:45:55+00:00

更新时间: 2026-01-16 02:23:42+00:00

在 Hugging Face 上查看

文件 (86)

.gitattributes

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/llm.speech_embedding.float16.bin

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/llm_decoder.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/model.embed_tokens.weight.bfloat16.bin

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l0_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l10_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l11_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l12_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l13_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l14_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l15_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l16_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l17_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l18_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l19_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l1_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l20_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l21_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l22_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l23_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l2_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l3_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l4_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l5_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l6_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l7_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l8_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_p64_l9_together.axmodel

CosyVoice-BlankEN-Ax650-C64-P256-CTX512/qwen2_post.axmodel

README.md

asset/dingding.png

asset/output.wav

asset/zero_shot_prompt.wav

config.json

frontend-onnx/campplus.onnx ONNX

frontend-onnx/speech_tokenizer_v3.onnx ONNX

main_api_ax650

main_api_axcl_aarch64

main_api_axcl_x86

main_ax650

main_axcl_aarch64

main_axcl_x86

prompt_files/flow_embedding.txt

prompt_files/flow_prompt_speech_token.txt

prompt_files/llm_embedding.txt

prompt_files/llm_prompt_speech_token.txt

prompt_files/prompt_speech_feat.txt

prompt_files/prompt_text.txt

run_api_ax650.sh

run_api_axcl_aarch64.sh

run_api_axcl_x86.sh

run_ax650.sh

run_axcl_aarch64.sh

run_axcl_x86.sh

scripts/CosyVoice-BlankEN/merges.txt

scripts/CosyVoice-BlankEN/tokenizer_config.json

scripts/CosyVoice-BlankEN/vocab.json

scripts/audio.py

scripts/cosyvoice3_tokenizer.py

scripts/frontend.py

scripts/gradio_demo.py

scripts/meldataset.py

scripts/process_prompt.py

scripts/requirements.txt

scripts/tokenizer/assets/multilingual_zh_ja_yue_char_del.tiktoken

scripts/tokenizer/tokenizer.py

token2wav-axmodels/flow.input_embedding.float16.bin

token2wav-axmodels/flow.input_embedding.npy

token2wav-axmodels/flow_encoder_28.axmodel

token2wav-axmodels/flow_encoder_50_final.axmodel

token2wav-axmodels/flow_encoder_53.axmodel

token2wav-axmodels/flow_encoder_78.axmodel

token2wav-axmodels/flow_estimator_200.axmodel

token2wav-axmodels/flow_estimator_250.axmodel

token2wav-axmodels/flow_estimator_300.axmodel

token2wav-axmodels/hift_p1_100.axmodel

token2wav-axmodels/hift_p1_100_final.axmodel

token2wav-axmodels/hift_p1_150.axmodel

token2wav-axmodels/hift_p1_50.axmodel

token2wav-axmodels/hift_p2_100.axmodel

token2wav-axmodels/hift_p2_100_final.axmodel

token2wav-axmodels/hift_p2_150.axmodel

token2wav-axmodels/hift_p2_50.axmodel

token2wav-axmodels/llm_decoder.axmodel

token2wav-axmodels/rand_noise_1_80_300.txt

token2wav-axmodels/speech_window_2x8x480.txt