返回模型
说明文档
mxbai-rerank-xsmall-v1
这是我们强大的重排序模型系列中最小的模型。您可以在我们的博客文章中了解更多关于这些模型的信息。
我们有三个模型:
快速开始
目前,使用我们模型的最佳方式是使用最新版本的 sentence-transformers。
pip install -U sentence-transformers
假设您有一个查询,并且想要对一组文档进行重排序。您只需一行代码即可完成:
from sentence_transformers import CrossEncoder
# 加载模型,这里我们使用基础大小的模型
model = CrossEncoder("mixedbread-ai/mxbai-rerank-xsmall-v1")
# 示例查询和文档
query = "Who wrote 'To Kill a Mockingbird'?"
documents = [
"'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
"The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
"Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
"The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
"'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]
# 让我们获取分数
results = model.rank(query, documents, return_documents=True, top_k=3)
<details> <summary>JavaScript 示例</summary>
npm i @xenova/transformers
假设您有一个查询,并且想要对一组文档进行重排序。在 JavaScript 中,您需要添加一个函数:
import { AutoTokenizer, AutoModelForSequenceClassification } from '@xenova/transformers';
const model_id = 'mixedbread-ai/mxbai-rerank-xsmall-v1';
const model = await AutoModelForSequenceClassification.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
/**
* 使用 CrossEncoder 对给定查询和文档进行排序。返回一个排序后的列表,包含文档索引和分数。
* @param {string} query 单个查询
* @param {string[]} documents 文档列表
* @param {Object} options 排序选项
* @param {number} [options.top_k=undefined] 返回前k个文档。如果未定义,则返回所有文档。
* @param {number} [options.return_documents=false] 如果为true,还返回文档。如果为false,则只返回索引和分数。
*/
async function rank(query, documents, {
top_k = undefined,
return_documents = false,
} = {}) {
const inputs = tokenizer(
new Array(documents.length).fill(query),
{
text_pair: documents,
padding: true,
truncation: true,
}
)
const { logits } = await model(inputs);
return logits
.sigmoid()
.tolist()
.map(([score], i) => ({
corpus_id: i,
score,
...(return_documents ? { text: documents[i] } : {})
}))
.sort((a, b) => b.score - a.score)
.slice(0, top_k);
}
// 示例用法:
const query = "Who wrote 'To Kill a Mockingbird'?"
const documents = [
"'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
"The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
"Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
"The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
"'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionhe Gatsby and his pursuit of Daisy Buchanan."
]
const results = await rank(query, documents, { return_documents: true, top_k: 3 });
console.log(results);
</details>
使用 API
您可以通过我们的 API 使用大模型,如下所示:
from mixedbread_ai.client import MixedbreadAI
mxbai = MixedbreadAI(api_key="{MIXEDBREAD_API_KEY}")
res = mxbai.reranking(
model="mixedbread-ai/mxbai-rerank-large-v1",
query="Who is the author of To Kill a Mockingbird?",
input=[
"To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
"The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
"Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
"The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
"The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
],
top_k=3,
return_input=False
)
print(res.data)
API 附带额外功能,例如持续训练的重排序器!查看文档了解更多信息。
评估
我们的重排序模型旨在提升您的搜索效果。它们与关键词搜索配合使用时效果极佳,在许多情况下甚至可以超越语义搜索系统。
| 模型 | NDCG@10 | Accuracy@3 |
|---|---|---|
| Lexical Search (Lucene) | 38.0 | 66.4 |
| BAAI/bge-reranker-base | 41.6 | 66.9 |
| BAAI/bge-reranker-large | 45.2 | 70.6 |
| cohere-embed-v3 (语义搜索) | 47.5 | 70.9 |
| mxbai-rerank-xsmall-v1 | 43.9 | 70.0 |
| mxbai-rerank-base-v1 | 46.9 | 72.3 |
| mxbai-rerank-large-v1 | 48.8 | 74.9 |
报告的结果是从 BEIR 的 11 个数据集中汇总得出的。我们使用 Pyserini 来评估模型。在我们的博客文章和此电子表格中了解更多。
社区
请加入我们的 Discord 社区,分享您的反馈和想法!我们在这里提供帮助,也随时乐意聊天。
引用
@online{rerank2024mxbai,
title={Boost Your Search With The Crispy Mixedbread Rerank Models},
author={Aamir Shakir and Darius Koenig and Julius Lipp and Sean Lee},
year={2024},
url={https://www.mixedbread.ai/blog/mxbai-rerank-v1},
}
许可证
Apache 2.0
mixedbread-ai/mxbai-rerank-xsmall-v1
作者 mixedbread-ai
text-ranking
transformers
↓ 966.2K
♥ 55
创建时间: 2024-02-29 10:31:57+00:00
更新时间: 2025-04-02 14:42:01+00:00
在 Hugging Face 上查看文件 (13)
.gitattributes
LICENSE
README.md
added_tokens.json
config.json
model.safetensors
onnx/model.onnx
ONNX
onnx/model_quantized.onnx
ONNX
onnx/quantize_config.json
special_tokens_map.json
spm.model
tokenizer.json
tokenizer_config.json