返回模型
说明文档
SPLADE CoCondenser EnsembleDistil
用于段落检索的 SPLADE 模型。更多详情,请访问:
- 论文:https://arxiv.org/abs/2205.04733
- 代码:https://github.com/naver/splade
| MRR@10 (MS MARCO dev) | R@1000 (MS MARCO dev) | |
|---|---|---|
splade-cocondenser-ensembledistil |
38.3 | 98.3 |
模型详情
这是一个 SPLADE 稀疏编码器 模型。它将句子和段落映射到 30522 维的稀疏向量空间,可用于语义搜索和稀疏检索。
模型描述
- 模型类型: SPLADE 稀疏编码器
- 基础模型: Luyu/co-condenser-marco
- 最大序列长度: 512 个 token(复现评估时为 256)
- 输出维度: 30522 维
- 相似度函数: 点积
完整模型架构
SparseEncoder(
(0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False}) with MLMTransformer model: BertForMaskedLM
(1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
)
使用方法
直接使用(Sentence Transformers)
首先安装 Sentence Transformers 库:
pip install -U sentence-transformers
然后你可以加载此模型并运行推理。
from sentence_transformers import SparseEncoder
# 从 🤗 Hub 下载
model = SparseEncoder("naver/splade-cocondenser-ensembledistil")
# 运行推理
queries = ["what causes aging fast"]
documents = [
"UV-A light, specifically, is what mainly causes tanning, skin aging, and cataracts, UV-B causes sunburn, skin aging and skin cancer, and UV-C is the strongest, and therefore most effective at killing microorganisms. Again â\x80\x93 single words and multiple bullets.",
"Answers from Ronald Petersen, M.D. Yes, Alzheimer's disease usually worsens slowly. But its speed of progression varies, depending on a person's genetic makeup, environmental factors, age at diagnosis and other medical conditions. Still, anyone diagnosed with Alzheimer's whose symptoms seem to be progressing quickly â\x80\x94 or who experiences a sudden decline â\x80\x94 should see his or her doctor.",
"Bell's palsy and Extreme tiredness and Extreme fatigue (2 causes) Bell's palsy and Extreme tiredness and Hepatitis (2 causes) Bell's palsy and Extreme tiredness and Liver pain (2 causes) Bell's palsy and Extreme tiredness and Lymph node swelling in children (2 causes)",
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 30522] [3, 30522]
# 获取嵌入的相似度分数
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 9.9933, 10.8691, 3.4265]])
引用
如果你使用我们的检查点,请引用我们的工作:
@misc{https://doi.org/10.48550/arxiv.2205.04733,
doi = {10.48550/ARXIV.2205.04733},
url = {https://arxiv.org/abs/2205.04733},
author = {Formal, Thibault and Lassance, Carlos and Piwowarski, Benjamin and Clinchant, Stéphane},
keywords = {Information Retrieval (cs.IR), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International}
}
xuanan2001/splade-cocondenser-ensembledistil-onnx
作者 xuanan2001
feature-extraction
sentence-transformers
↓ 0
♥ 0
创建时间: 2025-11-16 05:02:00+00:00
更新时间: 2025-11-16 05:10:25+00:00
在 Hugging Face 上查看文件 (18)
.gitattributes
1_SpladePooling/config.json
README.md
config.json
config_sentence_transformers.json
modules.json
onnx/config.json
onnx/model.onnx
ONNX
onnx/special_tokens_map.json
onnx/tokenizer.json
onnx/tokenizer_config.json
onnx/vocab.txt
pytorch_model.bin
sentence_bert_config.json
special_tokens_map.json
tokenizer.json
tokenizer_config.json
vocab.txt