返回模型
说明文档
这是 ai-hub-apps 仓库的修改版本,用于导出此模型
python .\export.py -target-runtime onnx --device \"Snapdragon X Elite CRD\" --skip-profiling --skip-inferencing
已修改: whisper/model.py
# The number of Mel features per audio context
# N_MELS = 80
# For Whisper V3 Turbo
N_MELS = 128
## COmmented out for now as we want to use it for Whisper V3 Turbo
# # Audio embedding length
# AUDIO_EMB_LEN = int(N_SAMPLES / N_MELS / 4)
# # Audio length per MEL feature
# MELS_AUDIO_LEN = AUDIO_EMB_LEN * 2
# Number of frames in the input mel spectrogram (e.g. 3000 for 30s audio at 160 hop_length).
# This corresponds to the 'n_frames' dimension of the mel spectrogram input to the Whisper AudioEncoder.
MELS_AUDIO_LEN = N_SAMPLES // HOP_LENGTH
# Length of the audio embedding from the encoder output (e.g. 1500).
# This corresponds to 'n_audio_ctx' in Whisper, which is MELS_AUDIO_LEN // 2
# due to the strided convolution in the encoder. This length is used for the
# cross-attention key/value cache from the encoder.
AUDIO_EMB_LEN = MELS_AUDIO_LEN // 2
WHISPER_VERSION = \"large-v3-turbo\"
# N_MELS_LARGE_V3_TURBO = 128
# DEFAULT_INPUT_SEQ_LEN = 3000
@CollectionModel.add_component(WhisperEncoderInf)
@CollectionModel.add_component(WhisperDecoderInf)
class WhisperV3Turbo(BaseWhisper):
@classmethod
def from_pretrained(cls):
return super().from_pretrained(WHISPER_VERSION)
你还需要将此补丁应用到 ai-hub 库中
https://github.com/openai/whisper/blob/dd985ac4b90cafeef8712f2998d62c59c3e62d22/whisper/init.py#L30
bweng/whisper-v3-turbo-onnx-qnn
作者 bweng
automatic-speech-recognition
↓ 0
♥ 0
创建时间: 2025-05-20 02:16:47+00:00
更新时间: 2025-05-20 02:37:39+00:00
在 Hugging Face 上查看文件 (6)
.gitattributes
README.md
WhisperDecoderInf.onnx
ONNX
WhisperEncoderInf.onnx.zip
WhisperEncoderInf.onnx/model.data
WhisperEncoderInf.onnx/model.onnx
ONNX