返回模型
说明文档
Spoken-Communication.axera
Axera 上的语音交流演示
- [x] Python 示例
- [ ] C++ 示例
转换工具链接:
对于有兴趣进行模型转换的用户,可以通过原始仓库导出 axmodel:
如何从 ONNX 转换为 axmodel
支持平台
- AX650N
功能
语音交流
Pipeline组件
上板部署
- AX650N 的设备已预装 Ubuntu22.04
- 以 root 权限登陆 AX650N 的板卡设备
- 链接互联网,确保 AX650N 的设备能正常执行 apt install, pip install 等指令
- 已验证设备:AX650N DEMO Board
Python API 运行
在python3.10(验证)
pipeline方案:ASR + LLM(Qwen) + MeloTTS
支持板端运行及算力卡模式运行
工程下载
git clone https://huggingface.co/AXERA-TECH/Spoken-Communication.axera 或者
hf download AXERA-TECH/Spoken-Communication.axera --local-dir Spoken-Communication.axera
cd Spoken-Communication.axera
工程目录文件结构如下:
.
|-- README.md
|-- ax_model
|-- ax_spoken_communication_demo.py
|-- config.json
|-- libaxllm
|-- libmelotts
|-- model.py
|-- requirements.txt
|-- utils
`-- input_question
具体流程
板端 demo
1、安装依赖库
1):
如果环境中没有axengine,下载安装,位置任意
hf download AXERA-TECH/PyAXEngine --local-dir PyAXEngine
cd PyAXEngine
pip3 install axengine-0.1.3-py3-none-any.whl
2):
cd Spoken-Communication.axera
pip3 install -r requirements.txt
3):
apt install espeak 或者
sudo apt install espeak
2、模型下载
主目录下执行命令:
hf download AXERA-TECH/Qwen2.5-1.5B-Instruct --local-dir libaxllm --include qwen2.5-1.5b-ctx-ax650/*
模型下载至libaxllm文件夹
3、在开发板运行以下命令
1)、运行qwen api
cd libaxllm
启动支持上下文的 tokenizer 服务器
python3 qwen2.5_tokenizer_uid.py
运行
sh run_qwen2.5_1.5b_ctx_ax650_api.sh
2)、运行pipeline板端demo
cd ..
python3 ax_spoken_communication_demo.py --audio_dir input_question --output_dir output_answer --api_url http://10.126.29.158:8000
运行参数说明:
| 参数名称 | 说明|
|-------|------|
| `--audio_dir` | 音频路径 |
| `--api_url` | qwen API服务地址,对应其运行服务器 |
| `--output_dir` | 结果保存路径 |
输出:
1)输入音频相对应的wav文件,2)识别信息"output_answer/processing_summary.txt"
如下:
批量处理结果汇总
文件 1: Q1.wav
原始文本: 人工智能和人类智能最本质的区别是什么?。
回答结果: 人工智能和人类智能最本质的区别在于,人工智能是基于算法和数据进行学习和决策的机器智能,而人类智能是基于经验和直觉进行思考和决策的生物智能。
合成音频: Q1_answer.wav
处理时间: 8.22 秒
音频时长: 15.19 秒
RTF: 0.54
文件 2: Q2.wav
原始文本: 人工智能没有思想,为什么他能创作出震撼人心的艺术?。
回答结果: 人工智能创作艺术是因为它可以通过算法和数据进行学习和分析,理解艺术作品的风格、情感和意义,然后通过生成模型进行创作。这与人类艺术家创作艺术的灵感、经验和直觉不同,但人工智能在某些领域已经表现出超越人类的能力。
合成音频: Q2_answer.wav
处理时间: 9.43 秒
音频时长: 23.68 秒
RTF: 0.40
文件 3: Q3.wav
原始文本: 人工智能最终会统治人类吗?。
回答结果: 人工智能的发展可能会对人类社会产生重大影响,但目前来看,人工智能尚未达到能够统治人类的程度。人工智能主要是在特定任务上表现出色,如数据分析、图像识别等,但在决策、伦理和情感理解等方面仍存在局限。
合成音频: Q3_answer.wav
处理时间: 8.86 秒
音频时长: 22.62 秒
RTF: 0.39
总计: 3 个文件
总处理时间: 26.53 秒
4、Latency
AX650N
RTF: 约为0.4,如上例。
算力卡demo
运行步骤与板端demo大致相同,以aarch64环境为例:
1、运行qwen api
cd libaxllm
启动支持上下文的 tokenizer 服务器
python3 qwen2.5_tokenizer_uid.py
运行对应环境的api
sh run_qwen2.5_1.5b_ctx_axcl_aarch64_api.sh
2、运行pipeline算力卡demo
cd ..
python3 ax_spoken_communication_demo.py --audio_dir input_question --api_url http://10.126.33.13:8000 --output_dir output
x86环境运行步骤同上
参考
技术讨论
- Github issues
- QQ 群: 139953715
AXERA-TECH/Spoken-Communication.axera
作者 AXERA-TECH
audio-to-audio
↓ 0
♥ 0
创建时间: 2025-11-13 09:50:26+00:00
更新时间: 2025-11-14 06:12:46+00:00
在 Hugging Face 上查看文件 (122)
.gitattributes
README.md
ax_model/.gitattributes
ax_model/auto.npy
ax_model/chn_jpn_yue_eng_ko_spectok.bpe.model
ax_model/event_emo.npy
ax_model/sensevoice.axmodel
ax_model/sensevoice/am.mvn
ax_model/sensevoice/config.yaml
ax_model/vad.axmodel
ax_model/vad/am.mvn
ax_model/vad/config.yaml
ax_model/withitn.npy
ax_spoken_communication_demo.py
config.json
input_question/Q1.wav
input_question/Q2.wav
input_question/Q3.wav
libaxllm/main_api_ax650
libaxllm/main_api_axcl_aarch64
libaxllm/main_api_axcl_x86
libaxllm/post_config.json
libaxllm/qwen2.5_tokenizer/merges.txt
libaxllm/qwen2.5_tokenizer/tokenizer.json
libaxllm/qwen2.5_tokenizer/tokenizer_config.json
libaxllm/qwen2.5_tokenizer/vocab.json
libaxllm/qwen2.5_tokenizer_uid.py
libaxllm/run_qwen2.5_1.5b_ctx_ax650_api.sh
libaxllm/run_qwen2.5_1.5b_ctx_axcl_aarch64_api.sh
libaxllm/run_qwen2.5_1.5b_ctx_axcl_x86_api.sh
libmelotts/models/decoder-en.axmodel
libmelotts/models/decoder-zh.axmodel
libmelotts/models/encoder-en.onnx
ONNX
libmelotts/models/encoder-zh.onnx
ONNX
libmelotts/models/g-en.bin
libmelotts/models/g-jp.bin
libmelotts/models/g-zh_mix_en.bin
libmelotts/models/lexicon.txt
libmelotts/models/tokens.txt
libmelotts/python/split_utils.py
libmelotts/python/symbols.py
libmelotts/python/text/__init__.py
libmelotts/python/text/bert-base-multilingual-uncased/special_tokens_map.json
libmelotts/python/text/bert-base-multilingual-uncased/tokenizer.json
libmelotts/python/text/bert-base-multilingual-uncased/tokenizer_config.json
libmelotts/python/text/bert-base-multilingual-uncased/vocab.txt
libmelotts/python/text/bert-base-uncased/special_tokens_map.json
libmelotts/python/text/bert-base-uncased/tokenizer.json
libmelotts/python/text/bert-base-uncased/tokenizer_config.json
libmelotts/python/text/bert-base-uncased/vocab.txt
libmelotts/python/text/chinese.py
libmelotts/python/text/chinese_bert.py
libmelotts/python/text/chinese_mix.py
libmelotts/python/text/cl-tohoku/bert-base-japanese-v3/special_tokens_map.json
libmelotts/python/text/cl-tohoku/bert-base-japanese-v3/tokenizer_config.json
libmelotts/python/text/cl-tohoku/bert-base-japanese-v3/vocab.txt
libmelotts/python/text/cleaner.py
libmelotts/python/text/cleaner_multiling.py
libmelotts/python/text/cmudict.rep
libmelotts/python/text/dccuchile/bert-base-spanish-wwm-uncased/special_tokens_map.json
libmelotts/python/text/dccuchile/bert-base-spanish-wwm-uncased/tokenizer.json
libmelotts/python/text/dccuchile/bert-base-spanish-wwm-uncased/tokenizer_config.json
libmelotts/python/text/dccuchile/bert-base-spanish-wwm-uncased/vocab.txt
libmelotts/python/text/distilbert-base-multilingual-cased/special_tokens_map.json
libmelotts/python/text/distilbert-base-multilingual-cased/tokenizer.json
libmelotts/python/text/distilbert-base-multilingual-cased/tokenizer_config.json
libmelotts/python/text/distilbert-base-multilingual-cased/vocab.txt
libmelotts/python/text/english.py
libmelotts/python/text/english_bert.py
libmelotts/python/text/english_utils/__init__.py
libmelotts/python/text/english_utils/abbreviations.py
libmelotts/python/text/english_utils/number_norm.py
libmelotts/python/text/english_utils/time_norm.py
libmelotts/python/text/es_phonemizer/__init__.py
libmelotts/python/text/es_phonemizer/base.py
libmelotts/python/text/es_phonemizer/cleaner.py
libmelotts/python/text/es_phonemizer/es_symbols.json
libmelotts/python/text/es_phonemizer/es_symbols.txt
libmelotts/python/text/es_phonemizer/es_symbols_v2.json
libmelotts/python/text/es_phonemizer/es_to_ipa.py
libmelotts/python/text/es_phonemizer/example_ipa.txt
libmelotts/python/text/es_phonemizer/gruut_wrapper.py
libmelotts/python/text/es_phonemizer/punctuation.py
libmelotts/python/text/es_phonemizer/spanish_symbols.txt
libmelotts/python/text/es_phonemizer/test.ipynb
libmelotts/python/text/fast_tokenizer.py
libmelotts/python/text/fr_phonemizer/__init__.py
libmelotts/python/text/fr_phonemizer/base.py
libmelotts/python/text/fr_phonemizer/cleaner.py
libmelotts/python/text/fr_phonemizer/en_symbols.json
libmelotts/python/text/fr_phonemizer/example_ipa.txt
libmelotts/python/text/fr_phonemizer/fr_symbols.json
libmelotts/python/text/fr_phonemizer/fr_to_ipa.py
libmelotts/python/text/fr_phonemizer/french_abbreviations.py
libmelotts/python/text/fr_phonemizer/french_symbols.txt
libmelotts/python/text/fr_phonemizer/gruut_wrapper.py
libmelotts/python/text/fr_phonemizer/punctuation.py
libmelotts/python/text/french.py
libmelotts/python/text/french_bert.py
libmelotts/python/text/japanese.py
libmelotts/python/text/japanese_bert.py
libmelotts/python/text/jp_tokenizer.py
libmelotts/python/text/ko_dictionary.py
libmelotts/python/text/korean.py
libmelotts/python/text/opencpop-strict.txt
libmelotts/python/text/spanish.py
libmelotts/python/text/spanish_bert.py
libmelotts/python/text/symbols.py
libmelotts/python/text/tone_sandhi.py
model.py
requirements.txt
utils/__init__.py
utils/ax_model_bin.py
utils/ax_vad_bin.py
utils/ctc_alignment.py
utils/frontend.py
utils/infer_utils.py
utils/utils/__init__.py
utils/utils/e2e_vad.py
utils/utils/frontend.py
utils/utils/utils.py
utils/vad_utils.py