说明文档

感谢 AI4Bharat 训练印地语专用语音识别模型。访问地址：https://huggingface.co/ai4bharat/indicconformer_stt_hi_hybrid_ctc_rnnt_large

该目录正在积极开发中。 https://github.com/deepanshu-yadav/Quantize_speech_Recognition_For_Hindi

本仓库旨在：

对 .nemo 模型进行量化（包括 CTC 和 RNNT 版本）。
移除 nemo 特定的依赖项。
最终将转换后的 onnx 模型用于离线和在线（麦克风）使用场景。

已完成 CTC 和 RNNT 两个版本的转换。

已经提供了转换为 float16 模型的 notebook。 CTC 版本的 notebook 名称为 onnxconversionCTC.ipynb。 RNNT 版本的 notebook 名称为 onnxconversionRNNT.ipynb。

如何执行推理

安装依赖

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

然后从 requirements 文件安装

pip install -r requirements.txt

CTC float16（非流式版本）离线模式

现在我们可以运行推理

python offline_ctc_float16_inference.py

注意：已提供一个示例文件。

预期输出：

Audio features shape: (1, 80, 1413), Length: [1413]
Transcription: शिवपाल की यह टिप्पणी फ़िल्म काल्या के डायलॉग से मिलतीजुलती है शिवपाल चाहते हैं कि मुलायम पारती के मुखिया फिर से बने फ़िलहाल सपा अध्यक्ष अखिलेश यादव हैं पिता से पार्ट की कमान छीनी थी

CTC float16（非流式模式）实时模式

您也可以从声音设备进行实时转录。

执行

python realtime_ctc_float16_non_streaming.py

预期输出

Using cache found in C:\Users\DEEPANSHU/.cache\torch\hub\snakers4_silero-vad_master
Listening... (Speak into the microphone)
Press 'q' to stop streaming...
C:\Users\DEEPANSHU\Desktop\automation\speech\hindi\git_inference_push\realtime_ctc_float16_non_streaming.py:55: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\torch\csrc\utils\tensor_numpy.cpp:209.)
  audio_tensor = torch.from_numpy(audio_float32)
Speech detected, recording...
Silence detected, transcribing...
Transcription: तो कैसे हैं आप सब
Listening...
Speech detected, recording...
Silence detected, transcribing...
Transcription: आपसे मिल के अच्छा लगा
Listening...

RNNT 版本

实时（麦克风）模式

这是 float16 RNNT 版本，采用非流式模式。

python realtime_rnnt_float16_non_streaming.py

离线文件模式

这是 float16 RNNT 版本，采用非流式模式。

python offline_rnnt_float16_non_streaming.py

pronoobie/indic_conformer_hi_float16_onnx_256_vocab

作者 pronoobie

automatic-speech-recognition

↓ 0 ♥ 3

创建时间: 2025-06-11 13:57:48+00:00

更新时间: 2025-06-18 20:12:56+00:00

在 Hugging Face 上查看

文件 (21)

.gitattributes

README.md

decoder_config.json

decoder_fp16.onnx ONNX

encoder_fp16.onnx ONNX

file.wav

indicconformer_stt_hi_ctc_only_fp16.onnx ONNX

inference_ctc_float16_non_streaming.py

inference_rnnt_float16_non_streaming.py

joiner_fp16.onnx ONNX

offline_ctc_float16_inference.py

offline_rnnt_float16_non_streaming.py

onnxconversionCTC.ipynb

onnxconversionRNNT.ipynb

realtime_ctc_float16_non_streaming.py

realtime_rnnt_float16_non_streaming.py

requirements.txt

tokenizer_hi.model

tokenizer_hi.vocab

tokens.txt

vocab.txt