---
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
license: mit
language:
- zh
---
<h1 align="center">FlagEmbedding</h1>
<h4 align="center">
<p>
<a href=#model-list>Model List</a> |
<a href=#frequently-asked-questions>FAQ</a> |
<a href=#usage>Usage</a> |
<a href="#evaluation">Evaluation</a> |
<a href="#train">Train</a> |
<a href="#contact">Contact</a> |
<a href="#citation">Citation</a> |
<a href="#license">License</a>
<p>
</h4>
More details please refer to our Github: [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding).
[English](README.md) | [中文](https://github.com/FlagOpen/FlagEmbedding/blob/master/README_zh.md)
FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search.
And it also can be used in vector databases for LLMs.
************* 🌟**Updates**🌟 *************
- 10/12/2023: Release [LLM-Embedder](./FlagEmbedding/llm_embedder/README.md), a unified embedding model to support diverse retrieval augmentation needs for LLMs. [Paper](https://arxiv.org/pdf/2310.07554.pdf) :fire:
- 09/15/2023: The [technical report](https://arxiv.org/pdf/2309.07597.pdf) of BGE has been released
- 09/15/2023: The [masive training data](https://data.baai.ac.cn/details/BAAI-MTP) of BGE has been released
- 09/12/2023: New models:
- **New reranker model**: release cross-encoder models `BAAI/bge-reranker-base` and `BAAI/bge-reranker-large`, which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models.
- **update embedding model**: release `bge-*-v1.5` embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
<details>
<summary>More</summary>
<!-- ### More -->
- 09/07/2023: Update [fine-tune code](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md): Add script to mine hard negatives and support adding instruction during fine-tuning.
- 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like [this](#using-langchain); C-MTEB **leaderboard** is [available](https://huggingface.co/spaces/mteb/leaderboard).
- 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗**
- 08/02/2023: Release `bge-large-*`(short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada:
- 08/01/2023: We release the [Chinese Massive Text Embedding Benchmark](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB) (**C-MTEB**), consisting of 31 test dataset.
</details>
## Model List
`bge` is short for `BAAI general embedding`.
| Model | Language | | Description | query instruction for retrieval [1] |
|:-------------------------------|:--------:| :--------:| :--------:|:--------:|
| [BAAI/llm-embedder](https://huggingface.co/BAAI/llm-embedder) | English | [Inference](./FlagEmbedding/llm_embedder/README.md) [Fine-tune](./FlagEmbedding/llm_embedder/README.md) | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See [README](./FlagEmbedding/llm_embedder/README.md) |
| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | |
| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | |
| [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
| [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
| [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
| [BAAI/bge-large-zh-v1.5](https://huggingface.co/BAAI/bge-large-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
| [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
| [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
| [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | :trophy: rank **1st** in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard | `Represent this sentence for searching relevant passages: ` |
| [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a base-scale model but with similar ability to `bge-large-en` | `Represent this sentence for searching relevant passages: ` |
| [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) |a small-scale model but with competitive performance | `Represent this sentence for searching relevant passages: ` |
| [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | :trophy: rank **1st** in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark | `为这个句子生成表示以用于检索相关文章:` |
| [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a base-scale model but with similar ability to `bge-large-zh` | `为这个句子生成表示以用于检索相关文章:` |
| [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a small-scale model but with compe
没有合适的资源?快使用搜索试试~ 我知道了~
bge-base-zh-v1.5 模型
共66个文件
sample:12个
json:9个
head:4个
需积分: 5 1 下载量 80 浏览量
2024-03-20
11:07:31
上传
评论
收藏 395KB ZIP 举报
温馨提示
Embedding 模型换成 bge-base-zh-v1.5 模型,实现更好的文档匹配效果。 langchat+chatGLM中使用大的文本解析模型; bge-base-zh-v1.5 模型进行gpu上快速运行解析文档; 模型参数适中; 可在较小的gpu上运行; 可放入langchat工程中运行
资源推荐
资源详情
资源评论
收起资源包目录
bge-base-zh-v1.5.zip (66个子文件)
bge-base-zh-v1.5
1_Pooling
config.json 190B
configuration.json 47B
.gitattributes 2KB
config_sentence_transformers.json 124B
tokenizer.json 429KB
.git
index 1KB
HEAD 23B
refs
heads
master 41B
tags
remotes
origin
HEAD 32B
objects
ca
4f9781030019ab9b253c6dcb8c7878b6dc87a5 58KB
da
5bfd57e34ca45582e4bdbaa3e6deb9efffa08d 114B
7b
c225d392b90484281c55c3e4f931eb5958e7be 233B
dc
b0c0d97d09b930d13600b1a773ddb27e441aab 107B
9b
95978aec2dcc0ffd85ad452343f2a8b5dc1994 169B
0b
a95fc47eb40094c780f6b2f14d4bcc512809fb 92B
9d
d80bc9aa3e350fe8cbc96fdf46747aa485c5d8 480B
3e
d8df15b640ce0cd22d8c9da630fcaafd46fb76 9KB
15
df791881f91e7963c9e0d704de2f2037baa002 127KB
eb
91a9301bed8aded9b96f0694907806cc85a9f6 9KB
37
fca74771bc76a8e01178ce3a6055a0995f8093 222B
f9
291c34499303a5103dea21c45e605bb20981bb 62B
20
83a929d7e78569d1618d5b9a30420ffd654820 126B
32
c503243f0a2042b78a061ba35122e96bdb995c 136B
97
b34fda6c2a054b5cd534549b43d47f833aec3e 234B
95
2a9b81c0bfd99800fabf352f69c7ccd46c5e43 155B
d8
5942d271b3c8f34d33b9bd3b9452d6be70ce9d 56B
30
2521d37a596a265e6a173224ea2c66735eadfa 135B
a7
4cfecde428485b1fa412594a9b07141696cb9e 59B
a8
b3208c2884c4efb86e49300fdd3dc877220cdf 87B
pack
info
12
858cf8b0f6c30351bbe641ad29ea3fdfa5e9e9 170B
5b
8c7dc9558386fde78276571db2a1e9901b00eb 166B
c5
b308816fd665c3a503d17be1769123a61a015e 135B
ea
85692bff64b0d1917833c31ddbca8ab10f5455 64B
6d
c54746c3f3962af0bd82a309dc5a3bc7fcf507 155B
58
62fc3104c9813ec0f4c1b8a188fe5b8c264bfd 482B
cd
7e25348c11403abb4fb997e76fed92c8cd1e14 480B
ce
2944b6bd79162c139659069780629165927b21 491B
64
ffc3bc817e46a60c018fc7f5ac947a1ff2fe57 61B
3b
3165edef8898aad1a98051d6cbfbdb4868e912 162B
description 73B
packed-refs 114B
info
exclude 240B
logs
HEAD 214B
refs
heads
master 214B
remotes
origin
HEAD 214B
hooks
post-update.sample 189B
prepare-commit-msg.sample 1KB
commit-msg.sample 896B
pre-receive.sample 544B
update.sample 4KB
pre-commit.sample 2KB
pre-rebase.sample 5KB
applypatch-msg.sample 478B
fsmonitor-watchman.sample 3KB
pre-applypatch.sample 424B
pre-push.sample 1KB
pre-merge-commit.sample 416B
config 285B
branches
pytorch_model.bin 134B
sentence_bert_config.json 52B
config.json 998B
tokenizer_config.json 366B
modules.json 349B
special_tokens_map.json 125B
README.md 28KB
vocab.txt 107KB
共 66 条
- 1
资源评论
AI探索先锋
- 粉丝: 380
- 资源: 5
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功