bge-base-zh-v1.5模型_bge-large-zh资源-CSDN文库

共66个文件

sample：12个

json：9个

head：4个

需积分: 5 80 浏览量 2024-03-20 11:07:31 上传评论收藏 395KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

bge-base-zh-v1.5.zip （66个子文件）

bge-base-zh-v1.5

1_Pooling

config.json 190B

configuration.json 47B

.gitattributes 2KB

config_sentence_transformers.json 124B

tokenizer.json 429KB

.git

index 1KB

HEAD 23B

refs

heads

master 41B

tags

remotes

origin

HEAD 32B

objects

4f9781030019ab9b253c6dcb8c7878b6dc87a5 58KB

5bfd57e34ca45582e4bdbaa3e6deb9efffa08d 114B

c225d392b90484281c55c3e4f931eb5958e7be 233B

b0c0d97d09b930d13600b1a773ddb27e441aab 107B

95978aec2dcc0ffd85ad452343f2a8b5dc1994 169B

a95fc47eb40094c780f6b2f14d4bcc512809fb 92B

d80bc9aa3e350fe8cbc96fdf46747aa485c5d8 480B

d8df15b640ce0cd22d8c9da630fcaafd46fb76 9KB

df791881f91e7963c9e0d704de2f2037baa002 127KB

91a9301bed8aded9b96f0694907806cc85a9f6 9KB

fca74771bc76a8e01178ce3a6055a0995f8093 222B

291c34499303a5103dea21c45e605bb20981bb 62B

83a929d7e78569d1618d5b9a30420ffd654820 126B

c503243f0a2042b78a061ba35122e96bdb995c 136B

b34fda6c2a054b5cd534549b43d47f833aec3e 234B

2a9b81c0bfd99800fabf352f69c7ccd46c5e43 155B

5942d271b3c8f34d33b9bd3b9452d6be70ce9d 56B

2521d37a596a265e6a173224ea2c66735eadfa 135B

4cfecde428485b1fa412594a9b07141696cb9e 59B

b3208c2884c4efb86e49300fdd3dc877220cdf 87B

pack

info

858cf8b0f6c30351bbe641ad29ea3fdfa5e9e9 170B

8c7dc9558386fde78276571db2a1e9901b00eb 166B

b308816fd665c3a503d17be1769123a61a015e 135B

85692bff64b0d1917833c31ddbca8ab10f5455 64B

c54746c3f3962af0bd82a309dc5a3bc7fcf507 155B

62fc3104c9813ec0f4c1b8a188fe5b8c264bfd 482B

7e25348c11403abb4fb997e76fed92c8cd1e14 480B

2944b6bd79162c139659069780629165927b21 491B

ffc3bc817e46a60c018fc7f5ac947a1ff2fe57 61B

3165edef8898aad1a98051d6cbfbdb4868e912 162B

description 73B

packed-refs 114B

info

exclude 240B

logs

HEAD 214B

refs

heads

master 214B

remotes

origin

HEAD 214B

hooks

post-update.sample 189B

prepare-commit-msg.sample 1KB

commit-msg.sample 896B

pre-receive.sample 544B

update.sample 4KB

pre-commit.sample 2KB

pre-rebase.sample 5KB

applypatch-msg.sample 478B

fsmonitor-watchman.sample 3KB

pre-applypatch.sample 424B

pre-push.sample 1KB

pre-merge-commit.sample 416B

config 285B

branches

pytorch_model.bin 134B

sentence_bert_config.json 52B

config.json 998B

tokenizer_config.json 366B

modules.json 349B

special_tokens_map.json 125B

README.md 28KB

vocab.txt 107KB

--- tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers license: mit language: - zh --- <h1 align="center">FlagEmbedding</h1> <h4 align="center"> <p> <a href=#model-list>Model List</a> | <a href=#frequently-asked-questions>FAQ</a> | <a href=#usage>Usage</a> | <a href="#evaluation">Evaluation</a> | <a href="#train">Train</a> | <a href="#contact">Contact</a> | <a href="#citation">Citation</a> | <a href="#license">License</a> <p> </h4> More details please refer to our Github: [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding). [English](README.md) | [中文](https://github.com/FlagOpen/FlagEmbedding/blob/master/README_zh.md) FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. And it also can be used in vector databases for LLMs. ************* 🌟**Updates**🌟 ************* - 10/12/2023: Release [LLM-Embedder](./FlagEmbedding/llm_embedder/README.md), a unified embedding model to support diverse retrieval augmentation needs for LLMs. [Paper](https://arxiv.org/pdf/2310.07554.pdf) :fire: - 09/15/2023: The [technical report](https://arxiv.org/pdf/2309.07597.pdf) of BGE has been released - 09/15/2023: The [masive training data](https://data.baai.ac.cn/details/BAAI-MTP) of BGE has been released - 09/12/2023: New models: - **New reranker model**: release cross-encoder models `BAAI/bge-reranker-base` and `BAAI/bge-reranker-large`, which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. - **update embedding model**: release `bge-*-v1.5` embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. <details> <summary>More</summary>  - 09/07/2023: Update [fine-tune code](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md): Add script to mine hard negatives and support adding instruction during fine-tuning. - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like [this](#using-langchain); C-MTEB **leaderboard** is [available](https://huggingface.co/spaces/mteb/leaderboard). - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** - 08/02/2023: Release `bge-large-*`(short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: - 08/01/2023: We release the [Chinese Massive Text Embedding Benchmark](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB) (**C-MTEB**), consisting of 31 test dataset. </details> ## Model List `bge` is short for `BAAI general embedding`. | Model | Language | | Description | query instruction for retrieval [1] | |:-------------------------------|:--------:| :--------:| :--------:|:--------:| | [BAAI/llm-embedder](https://huggingface.co/BAAI/llm-embedder) | English | [Inference](./FlagEmbedding/llm_embedder/README.md) [Fine-tune](./FlagEmbedding/llm_embedder/README.md) | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See [README](./FlagEmbedding/llm_embedder/README.md) | | [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | | | [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | | | [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` | | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` | | [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` | | [BAAI/bge-large-zh-v1.5](https://huggingface.co/BAAI/bge-large-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章：` | | [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章：` | | [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章：` | | [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | :trophy: rank **1st** in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard | `Represent this sentence for searching relevant passages: ` | | [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a base-scale model but with similar ability to `bge-large-en` | `Represent this sentence for searching relevant passages: ` | | [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) |a small-scale model but with competitive performance | `Represent this sentence for searching relevant passages: ` | | [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | :trophy: rank **1st** in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark | `为这个句子生成表示以用于检索相关文章：` | | [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a base-scale model but with similar ability to `bge-large-zh` | `为这个句子生成表示以用于检索相关文章：` | | [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a small-scale model but with compe

评论收藏

内容反馈