说明文档

这是 ai-hub-apps 仓库的修改版本，用于导出此模型

python .\export.py -target-runtime onnx --device \"Snapdragon X Elite CRD\" --skip-profiling --skip-inferencing

已修改: whisper/model.py

# The number of Mel features per audio context
# N_MELS = 80

# For Whisper V3 Turbo
N_MELS = 128


## COmmented out for now as we want to use it for Whisper V3 Turbo
# # Audio embedding length
# AUDIO_EMB_LEN = int(N_SAMPLES / N_MELS / 4)

# # Audio length per MEL feature
# MELS_AUDIO_LEN = AUDIO_EMB_LEN * 2


# Number of frames in the input mel spectrogram (e.g. 3000 for 30s audio at 160 hop_length).
# This corresponds to the 'n_frames' dimension of the mel spectrogram input to the Whisper AudioEncoder.
MELS_AUDIO_LEN = N_SAMPLES // HOP_LENGTH

# Length of the audio embedding from the encoder output (e.g. 1500).
# This corresponds to 'n_audio_ctx' in Whisper, which is MELS_AUDIO_LEN // 2
# due to the strided convolution in the encoder. This length is used for the
# cross-attention key/value cache from the encoder.
AUDIO_EMB_LEN = MELS_AUDIO_LEN // 2

WHISPER_VERSION = \"large-v3-turbo\"
# N_MELS_LARGE_V3_TURBO = 128
# DEFAULT_INPUT_SEQ_LEN = 3000 

@CollectionModel.add_component(WhisperEncoderInf)
@CollectionModel.add_component(WhisperDecoderInf)
class WhisperV3Turbo(BaseWhisper):
    @classmethod
    def from_pretrained(cls):
        return super().from_pretrained(WHISPER_VERSION)

你还需要将此补丁应用到 ai-hub 库中

https://github.com/openai/whisper/blob/dd985ac4b90cafeef8712f2998d62c59c3e62d22/whisper/init.py#L30

bweng/whisper-v3-turbo-onnx-qnn

作者 bweng

automatic-speech-recognition

↓ 0 ♥ 0

创建时间: 2025-05-20 02:16:47+00:00

更新时间: 2025-05-20 02:37:39+00:00

在 Hugging Face 上查看

文件 (6)

.gitattributes

README.md

WhisperDecoderInf.onnx ONNX

WhisperEncoderInf.onnx.zip

WhisperEncoderInf.onnx/model.data

WhisperEncoderInf.onnx/model.onnx ONNX