说明文档

pplx-embed-v1: 扩散预训练的密集嵌入和上下文嵌入

pplx-embed-v1 和 pplx-embed-context-v1 是业界领先的文本嵌入模型，针对真实世界、网络规模的检索任务进行了优化。

对独立文本嵌入（查询、文档、语义搜索）使用 pplx-embed-v1
对于 RAG 系统中周围上下文重要的文档块使用 pplx-embed-context-v1

[!重要提示] pplx-embed-v1 和 pplx-embed-context-v1 原生输出非标准化的 int8 量化嵌入。请确保通过余弦相似度进行比较。

模型

模型	维度	上下文	MRL	量化	指令	池化
`pplx-embed-v1-0.6B`	1024	32K	是	INT8/BINARY	否	Mean
`pplx-embed-v1-4B`	2560	32K	是	INT8/BINARY	否	Mean
`pplx-embed-context-v1-0.6B`	1024	32K	是	INT8/BINARY	否	Mean
`pplx-embed-context-v1-4B`	2560	32K	是	INT8/BINARY	否	Mean

所有模型均基于 Perplexity AI 的扩散继续预训练 Qwen3 构建。

许多现代嵌入模型依赖指令调优，用户需要在嵌入文本前添加指令字符串。这可以在基准测试中提升 2%-3%，但也会带来提示选择开销，并可能使索引管道变得脆弱（小的指令变化可能会改变嵌入空间）。我们刻意避免这一要求：您可以直接嵌入想要索引的文本，无需选择或维护指令前缀。

使用方法

<details> <summary>通过 API（上下文嵌入）</summary>

curl -X POST https://api.perplexity.ai/v1/contextualizedembeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": [
      [
        "Curiosity begins in childhood with endless questions about the world.",
        "As we grow, curiosity drives us to explore new ideas and challenge assumptions.",
        "Scientific breakthroughs often start with a simple curious question."
      ],
      [
        "The curiosity rover explores Mars, searching for signs of ancient life.",
        "Each discovery on Mars sparks new questions about our place in the universe."
      ]
    ],
    "model": "pplx-embed-context-v1-0.6b"
  }'

</details>

<details> <summary>使用 Transformers</summary>

from transformers import AutoModel

model_ctx = AutoModel.from_pretrained(
    "perplexity-ai/pplx-embed-context-v1-0.6B",
    trust_remote_code=True
)

doc_chunks = [
    [
        "Curiosity begins in childhood with endless questions about the world.",
        "As we grow, curiosity drives us to explore new ideas.",
        "Scientific breakthroughs often start with a curious question."
    ],
    [
        "The curiosity rover explores Mars searching for ancient life.",
        "Each discovery on Mars sparks new questions about the universe."
    ]
]
# Returns list of numpy arrays (one per document)
# embeddings[0].shape = (3, 1024), embeddings[1].shape = (2, 1024)
embeddings = model_ctx.encode(doc_chunks)

</details> <details> <summary>使用 ONNX 模型</summary>


import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
import torch

def quantize_int8_tanh(x):
    normalized = torch.tanh(x)
    rounded = torch.round(normalized * 127)
    clamped = torch.clamp(rounded, -128, 127)
    return clamped

def quantize_binary(x):
    return torch.where(x >= 0, 1.0, -1.0)

def mean_pooling(
    token_embeddings: torch.Tensor, attention_mask: torch.Tensor
) -> torch.Tensor:
    """Apply mean pooling to token embeddings."""
    input_mask_expanded = (
        attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    )
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
        input_mask_expanded.sum(1), min=1e-9
    )

def extract_chunks_from_concatenated(
    input_ids: torch.Tensor,
    token_embeddings: torch.Tensor,
    attention_mask: torch.Tensor,
    tokenizer,
) -> list[list[torch.Tensor]]:
    """
    Extract individual chunk embeddings from concatenated sequence using late chunking.

    This method splits concatenated sequences like "[chunk1][SEP][chunk2][SEP]..."
    back into individual chunk embeddings by finding SEP token positions.

    Args:
        input_ids: Token IDs (batch_size, seq_len)
        token_embeddings: Token embeddings (batch_size, seq_len, hidden_dim)
        attention_mask: Attention mask (batch_size, seq_len)

    Returns:
        list[list[torch.Tensor]]: List of documents, each containing list of chunk embeddings

    Note:
        The sep_token_id is retrieved tokenizer.sep_token_id.
        Common values: pplx-embed-1=151643, BERT=102, varies by tokenizer.
    """
    sep_token_id = tokenizer.sep_token_id
    batch_size = input_ids.shape[0]

    all_doc_chunks = []

    for batch_idx in range(batch_size):
        # non-pad sep tokens
        valid_positions = attention_mask[batch_idx].bool()
        sep_positions = (
            (input_ids[batch_idx] == sep_token_id) & valid_positions
        ).nonzero(as_tuple=True)[0]

        chunk_embeddings = []
        start_pos = 0

        for sep_pos in sep_positions:
            chunk_tokens = token_embeddings[batch_idx, start_pos:sep_pos]
            chunk_mask = attention_mask[batch_idx, start_pos:sep_pos]

            chunk_emb = mean_pooling(
                chunk_tokens.unsqueeze(0), chunk_mask.unsqueeze(0)
            ).squeeze(0)

            chunk_embeddings.append(chunk_emb)

            start_pos = sep_pos + 1

        # Handle the last chunk (after the last SEP token)
        last_valid_pos = attention_mask[batch_idx].sum().item()

        chunk_tokens = token_embeddings[batch_idx, start_pos:last_valid_pos]
        chunk_mask = attention_mask[batch_idx, start_pos:last_valid_pos]

        if chunk_mask.sum() > 0:
            chunk_emb = mean_pooling(
                chunk_tokens.unsqueeze(0), chunk_mask.unsqueeze(0)
            ).squeeze(0)
        else:
            # Empty chunk - create zero embedding
            chunk_emb = torch.zeros(
                token_embeddings.shape[-1],
                device=token_embeddings.device,
                dtype=token_embeddings.dtype,
            )

        chunk_embeddings.append(chunk_emb)

        all_doc_chunks.append(chunk_embeddings)

    return all_doc_chunks


hf_path=("perplexity-ai/pplx-embed-context-v1-0.6b")
onnx_path=("onnx/model.onnx")

tokenizer = AutoTokenizer.from_pretrained(hf_path, trust_remote_code=True)
session = ort.InferenceSession(onnx_path)

texts = [
    [
        "Curiosity begins in childhood with endless questions about the world.",
        "As we grow, curiosity drives us to explore new ideas.",
        "Scientific breakthroughs often start with a curious question."
    ],
    [
        "The curiosity rover explores Mars searching for ancient life.",
        "Each discovery on Mars sparks new questions about the universe."
    ]
]
doc_strings = [
    tokenizer.sep_token.join(chunks) for chunks in texts
]

tokenized = tokenizer(
    doc_strings,
    padding=True,
    truncation=True,
    return_tensors="np",
)
onnx_inputs = {
    "input_ids": tokenized["input_ids"].astype(np.int64),
    "attention_mask": tokenized["attention_mask"].astype(np.int64),
}

# Run inference
onnx_outputs = session.run([out.name for out in session.get_outputs()], onnx_inputs)
# onnx_outputs is a list with one element: [last_hidden_state]
last_hidden_state = onnx_outputs[0]

batch_chunk_embeddings = extract_chunks_from_concatenated(
    input_ids=torch.tensor(onnx_inputs["input_ids"]),
    token_embeddings=torch.tensor(last_hidden_state),
    attention_mask=torch.tensor(onnx_inputs["attention_mask"]),
    tokenizer=tokenizer,
)

batch_chunk_embeddings = [
    torch.stack([chunk for chunk in doc_chunks], dim=0)
    for doc_chunks in batch_chunk_embeddings
]

int8_embeddings = [quantize_int8_tanh(x) for x in batch_chunk_embeddings]
binary_embeddings = [quantize_binary(x) for x in batch_chunk_embeddings]

bits = [np.where(doc.numpy() >=  0, True, False) for doc in binary_embeddings]
packed_embeddings = [np.packbits(b, axis=-1) for b in bits]

</details>

技术细节

如需全面的技术细节和评估结果，请参阅我们的 arXiv 论文：https://arxiv.org/abs/2602.11151.

perplexity-ai/pplx-embed-context-v1-0.6b

作者 perplexity-ai

feature-extraction transformers

↓ 167.2K ♥ 49

创建时间: 2026-01-20 11:43:34+00:00

更新时间: 2026-03-02 09:28:54+00:00

在 Hugging Face 上查看

文件 (24)

.gitattributes

1_Pooling/config.json

README.md

added_tokens.json

assets/diag.png

assets/logo.svg

config.json

configuration.py

merges.txt

model.safetensors

modeling.py

modules.json

onnx/model.onnx ONNX

onnx/model.onnx_data

onnx/model.onnx_data_1

onnx/model_q4.onnx ONNX

onnx/model_q4.onnx_data

onnx/model_quantized.onnx ONNX

onnx/model_quantized.onnx_data

special_tokens_map.json

st_quantize.py

tokenizer.json

tokenizer_config.json

vocab.json