说明文档

pplx-embed-v1: 基于扩散预训练的稠密与上下文嵌入

pplx-embed-v1 和 pplx-embed-context-v1 是专为真实世界、网络规模检索任务优化的最先进文本嵌入模型。

使用 pplx-embed-v1 进行独立文本嵌入（查询、文档、语义搜索）
使用 pplx-embed-context-v1 处理 RAG 系统中需要考虑周围上下文的文档片段

[!IMPORTANT] pplx-embed-v1 和 pplx-embed-context-v1 原生生成未归一化的 int8 量化嵌入。请确保通过余弦相似度进行比较。

模型

模型	维度	上下文长度	MRL	量化	指令	池化
`pplx-embed-v1-0.6B`	1024	32K	是	INT8/二进制	否	均值
`pplx-embed-v1-4B`	2560	32K	是	INT8/二进制	否	均值
`pplx-embed-context-v1-0.6B`	1024	32K	是	INT8/二进制	否	均值
`pplx-embed-context-v1-4B`	2560	32K	是	INT8/二进制	否	均值

所有模型均基于 Perplexity AI 的扩散持续预训练 Qwen3 构建。

许多现代嵌入模型依赖指令微调，用户需要在待嵌入文本前添加指令字符串。这可以在基准测试上带来 2%-3% 的提升，但也会引入提示选择开销，并使索引管道变得脆弱（指令的微小变化可能导致嵌入空间偏移）。我们刻意避免这一要求：你可以直接嵌入想要索引的文本，无需选择或维护指令前缀。

使用方法

curl -X POST https://api.perplexity.ai/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": [
      "Scientists explore the universe driven by curiosity.",
      "Children learn through curious exploration.",
      "Historical discoveries began with curious questions.",
      "Animals use curiosity to adapt and survive.",
      "Philosophy examines the nature of curiosity."
    ],
    "model": "pplx-embed-v1-0.6b"
  }'

</details>

<details> <summary>使用 SentenceTransformers</summary>

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "perplexity-ai/pplx-embed-v1-0.6B",
    trust_remote_code=True
)

texts = [
    "Scientists explore the universe driven by curiosity.",
    "Children learn through curious exploration.",
    "Historical discoveries began with curious questions.",
    "Animals use curiosity to adapt and survive.",
    "Philosophy examines the nature of curiosity.",
]

embeddings = model.encode(texts) # 形状: (5, 1024), 量化为 int8
embeddings = model.encode(texts, quantization="binary") # 形状: (5, 1024), 量化为二进制

</details>

import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np

tokenizer = AutoTokenizer.from_pretrained("perplexity-ai/pplx-embed-v1-0.6b", trust_remote_code=True)
session = ort.InferenceSession("onnx/model.onnx")


texts = [
    "Scientists explore the universe driven by curiosity.",
    "Children learn through curious exploration.",
    "Historical discoveries began with curious questions.",
    "Animals use curiosity to adapt and survive.",
    "Philosophy examines the nature of curiosity.",
]

tokenized = tokenizer(
    texts,
    padding=True,
    truncation=True,
    return_tensors="np"
)

onnx_inputs = {
    "input_ids": tokenized["input_ids"].astype(np.int64),
    "attention_mask": tokenized["attention_mask"].astype(np.int64),
}

# 运行推理
onnx_embeddings = session.run([out.name for out in session.get_outputs()], onnx_inputs)

# ONNX 同时生成 int8 和二进制精度嵌入：
int8_embeddings = onnx_embeddings[2]
binary_embeddings = onnx_embeddings[3]
packed_embeddings = np.packbits(binary_embeddings != -1, axis=-1)

</details>

<details> <summary>使用 Text Embeddings Inference (TEI)</summary>

[!NOTE] 需要 Text Embeddings Inference v1.9.2 或更高版本。

[!IMPORTANT] 目前 TEI 仅提供 int8 量化嵌入。请记住对未归一化的 int8 嵌入使用余弦相似度。

CPU 配合 Candle:

docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.9 --model-id perplexity-ai/pplx-embed-v1-0.6B --dtype float32

CPU 配合 ORT (ONNX Runtime):

docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.9 --model-id onnx-community/pplx-embed-v1-0.6B --dtype float32

GPU 配合 CUDA:

docker run --gpus all --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 --model-id perplexity-ai/pplx-embed-v1-0.6B --dtype float32

如果在预热期间遇到 OOM，请降低 --max-batch-tokens 和 --max-client-batch-size。将 --max-batch-tokens 设置为 max_sequence_length × batch_size（例如，2048 个词元 × 8 个序列 = 16384）。

或者，在 CUDA 环境下运行时，可以使用针对特定架构/计算能力的容器，而不是 cuda-1.9，因为后者包含 Turing、Ampere、Hopper 和 Blackwell 的二进制文件，所以使用专用容器会更轻量，例如 ampere-1.9。

然后你可以通过 cURL 向 /embed 发送请求：

curl http://0.0.0.0:8080/embed \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": [
      "Scientists explore the universe driven by curiosity.",
      "Children learn through curious exploration.",
      "Historical discoveries began with curious questions.",
      "Animals use curiosity to adapt and survive.",
      "Philosophy examines the nature of curiosity."
    ],
    "normalize": false
  }'

</details>

技术细节

有关全面的技术细节和评估结果，请参阅我们在 arXiv 上的论文：https://arxiv.org/abs/2602.11151.

perplexity-ai/pplx-embed-v1-0.6b

作者 perplexity-ai

feature-extraction sentence-transformers

↓ 733.4K ♥ 181

创建时间: 2026-01-14 15:05:25+00:00

更新时间: 2026-03-03 08:17:37+00:00

在 Hugging Face 上查看

文件 (24)

.gitattributes

1_Pooling/config.json

README.md

added_tokens.json

assets/diag.png

assets/logo.svg

config.json

configuration.py

merges.txt

model.safetensors

modeling.py

modules.json

onnx/model.onnx ONNX

onnx/model.onnx_data

onnx/model.onnx_data_1

onnx/model_q4.onnx ONNX

onnx/model_q4.onnx_data

onnx/model_quantized.onnx ONNX

onnx/model_quantized.onnx_data

special_tokens_map.json

st_quantize.py

tokenizer.json

tokenizer_config.json

vocab.json