返回模型
说明文档
这是 https://huggingface.co/laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K 的未优化导出版本,可直接用于 DeepSparse。它在 Imagenette 上实现了 95.7% 的零样本 top-1 准确率。
使用设置
首先,安装带有 CLIP 扩展的 DeepSparse:
pip install deepsparse-nightly[clip]>=1.7.0.20231210
下载一些测试图片,包括教堂、狗和大象:
wget -O basilica.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg
wget -O buddy.jpeg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/tests/deepsparse/pipelines/sample_images/buddy.jpeg
wget -O thailand.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolact/sample_images/thailand.jpg
此模型有第二个输入,即 token 长度,因此在创建文本流水线之前运行此输入覆盖代码:
import numpy as np
from deepsparse.clip import CLIPTextPipeline
def custom_process_inputs(self, inputs):
if not isinstance(inputs.text, list):
inputs.text = [inputs.text]
if not isinstance(inputs.text[0], str):
return inputs.text
tokens = [np.array(t).astype(np.int32) for t in self.tokenizer(inputs.text)]
tokens = np.stack(tokens, axis=0)
tokens_lengths = np.array(tokens.shape[0] * [tokens.shape[1] - 1])
return [tokens, tokens_lengths]
# 这将为所有 CLIPTextPipeline 类全局覆盖 process_inputs 函数
CLIPTextPipeline.process_inputs = custom_process_inputs
文本嵌入流水线
以下是如何创建和使用 DeepSparse 文本嵌入流水线的示例。
from deepsparse import Pipeline
from huggingface_hub import snapshot_download
# 从 HF 下载模型
model_folder = snapshot_download(repo_id="neuralmagic/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K-quant-ds")
text_embed_pipeline = Pipeline.create(task="clip_text", model_path=model_folder + "/textual.onnx")
text = ["ice cream", "an elephant", "a dog", "a building", "a church"]
embeddings = text_embed_pipeline(text=text).text_embeddings
for i in range(len(embeddings)):
print(embeddings[i].shape)
print(embeddings[i])
图像嵌入流水线
以下是如何创建和使用 DeepSparse 图像嵌入流水线的示例。
from deepsparse import Pipeline
from huggingface_hub import snapshot_download
# 从 HF 下载模型
model_folder = snapshot_download(repo_id="neuralmagic/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K-quant-ds")
image_embed_pipeline = Pipeline.create(task="clip_visual", model_path=model_folder + "/visual.onnx")
images = ["basilica.jpg", "buddy.jpeg", "thailand.jpg"]
embeddings = image_embed_pipeline(images=images).image_embeddings
for i in range(len(embeddings)):
print(embeddings[i].shape)
print(embeddings[i])
零样本图像分类流水线
由于 CLIP 同时训练了文本和图像嵌入模型,我们可以为两者生成嵌入并将它们关联起来,无需重新训练。以下是如何创建和使用 DeepSparse 零样本图像分类流水线的示例。
from deepsparse import Pipeline
from deepsparse.clip import (
CLIPTextInput,
CLIPVisualInput,
CLIPZeroShotInput
)
from huggingface_hub import snapshot_download
# 从 HF 下载模型
model_folder = snapshot_download(repo_id="neuralmagic/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K-quant-ds")
possible_classes = ["ice cream", "an elephant", "a dog", "a building", "a church"]
images = ["basilica.jpg", "buddy.jpeg", "thailand.jpg"]
# 将模型加载到 DeepSparse
pipeline = Pipeline.create(
task="clip_zeroshot",
visual_model_path=model_folder + "/visual.onnx",
text_model_path=model_folder + "/textual.onnx"
)
# 推理
output = pipeline(
image=CLIPVisualInput(images=images),
text=CLIPTextInput(text=possible_classes),
).text_scores
for i in range(len(output)):
prediction = possible_classes[np.argmax(output[i])]
print(f"Image {images[i]} is a picture of {prediction}")
"""
Image basilica.jpg is a picture of a church
Image buddy.jpeg is a picture of a dog
Image thailand.jpg is a picture of an elephant
"""
RedHatAI/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K-ds
作者 RedHatAI
zero-shot-classification
↓ 0
♥ 0
创建时间: 2023-12-20 20:14:31+00:00
更新时间: 2023-12-20 20:26:46+00:00
在 Hugging Face 上查看文件 (7)
.gitattributes
README.md
open_clip_config.json
special_tokens_map.json
textual.onnx
ONNX
tokenizer_config.json
visual.onnx
ONNX