说明文档
<p align="center"> <b>来自 <a href="https://mixedbread.ai"><b>Mixedbread</b></a> 的酥脆重排序模型系列。</b> </p>
<p align="center"> <sup> 🍞 寻找简单的端到端检索解决方案?了解一下我们的多模态和多语言模型 Omni。<a href="https://mixedbread.com"><b>联系我们获取访问权限。</a> </sup> </p>
mxbai-rerank-base-v1
这是我们强大的重排序模型系列的基础模型。您可以在我们的博客文章中了解更多关于这些模型的信息。
我们有三个模型:
快速入门
目前,使用我们模型的最佳方式是使用最新版本的 sentence-transformers。
pip install -U sentence-transformers
假设您有一个查询,想要对一组文档进行重排序。您只需一行代码即可完成:
from sentence_transformers import CrossEncoder
# Load the model, here we use our base sized model
model = CrossEncoder("mixedbread-ai/mxbai-rerank-base-v1")
# Example query and documents
query = "Who wrote 'To Kill a Mockingbird'?"
documents = [
"'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
"The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
"Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
"The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
"'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]
# Lets get the scores
results = model.rank(query, documents, return_documents=True, top_k=3)
<details> <summary>JavaScript 示例</summary>
npm i @xenova/transformers
假设您有一个查询,想要对一组文档进行重排序。在 JavaScript 中,您需要添加一个函数:
import { AutoTokenizer, AutoModelForSequenceClassification } from '@xenova/transformers';
const model_id = 'mixedbread-ai/mxbai-rerank-base-v1';
const model = await AutoModelForSequenceClassification.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
/**
* Performs ranking with the CrossEncoder on the given query and documents. Returns a sorted list with the document indices and scores.
* @param {string} query A single query
* @param {string[]} documents A list of documents
* @param {Object} options Options for ranking
* @param {number} [options.top_k=undefined] Return the top-k documents. If undefined, all documents are returned.
* @param {number} [options.return_documents=false] If true, also returns the documents. If false, only returns the indices and scores.
*/
async function rank(query, documents, {
top_k = undefined,
return_documents = false,
} = {}) {
const inputs = tokenizer(
new Array(documents.length).fill(query),
{
text_pair: documents,
padding: true,
truncation: true,
}
)
const { logits } = await model(inputs);
return logits
.sigmoid()
.tolist()
.map(([score], i) => ({
corpus_id: i,
score,
...(return_documents ? { text: documents[i] } : {})
}))
.sort((a, b) => b.score - a.score)
.slice(0, top_k);
}
// Example usage:
const query = "Who wrote 'To Kill a Mockingbird'?"
const documents = [
"'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
"The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
"Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
"The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
"'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]
const results = await rank(query, documents, { return_documents: true, top_k: 3 });
console.log(results);
</details>
使用 API
您可以通过我们的 API 使用大型模型,如下所示:
from mixedbread_ai.client import MixedbreadAI
mxbai = MixedbreadAI(api_key="{MIXEDBREAD_API_KEY}")
res = mxbai.reranking(
model="mixedbread-ai/mxbai-rerank-large-v1",
query="Who is the author of To Kill a Mockingbird?",
input=[
"To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
"The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
"Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
"The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
"The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
],
top_k=3,
return_input=false
)
print(res.data)
该 API 还包含其他功能,例如持续训练的重排序器!查看文档了解更多信息。
评估
我们的重排序模型旨在提升您的搜索效果。它们与关键词搜索配合使用时效果极佳,在许多情况下甚至可以超越语义搜索系统。
| 模型 | NDCG@10 | Accuracy@3 |
|---|---|---|
| Lexical Search (Lucene) | 38.0 | 66.4 |
| BAAI/bge-reranker-base | 41.6 | 66.9 |
| BAAI/bge-reranker-large | 45.2 | 70.6 |
| cohere-embed-v3 (semantic search) | 47.5 | 70.9 |
| mxbai-rerank-xsmall-v1 | 43.9 | 70.0 |
| mxbai-rerank-base-v1 | 46.9 | 72.3 |
| mxbai-rerank-large-v1 | 48.8 | 74.9 |
报告的结果是从 BEIR 的 11 个数据集中汇总得出的。我们使用 Pyserini 来评估模型。在我们的博客文章和此电子表格中了解更多详情。
社区
请加入我们的 Discord 社区,分享您的反馈和想法!我们在这里提供帮助,也非常乐意与您交流。
引用
@online{rerank2024mxbai,
title={Boost Your Search With The Crispy Mixedbread Rerank Models},
author={Aamir Shakir and Darius Koenig and Julius Lipp and Sean Lee},
year={2024},
url={https://www.mixedbread.ai/blog/mxbai-rerank-v1},
}
许可证
Apache 2.0
mixedbread-ai/mxbai-rerank-base-v1
作者 mixedbread-ai
创建时间: 2024-02-29 14:36:24+00:00
更新时间: 2025-04-02 14:40:30+00:00
在 Hugging Face 上查看