<p align="center">
<a href="https://trychroma.com"><img src="https://user-images.githubusercontent.com/891664/227103090-6624bf7d-9524-4e05-9d2c-c28d5d451481.png" alt="Chroma logo"></a>
</p>
<p align="center">
<b>Chroma - the open-source embedding database</b>. <br />
The fastest way to build Python or JavaScript LLM apps with memory!
</p>
<p align="center">
<a href="https://discord.gg/MMeYNTmh3x" target="_blank">
<img src="https://img.shields.io/discord/1073293645303795742" alt="Discord">
</a> |
<a href="https://github.com/chroma-core/chroma/blob/master/LICENSE" target="_blank">
<img src="https://img.shields.io/static/v1?label=license&message=Apache 2.0&color=white" alt="License">
</a> |
<a href="https://docs.trychroma.com/" target="_blank">
Docs
</a> |
<a href="https://www.trychroma.com/" target="_blank">
Homepage
</a>
</p>
<p align="center">
<a href="https://github.com/chroma-core/chroma/actions/workflows/chroma-integration-test.yml" target="_blank">
<img src="https://github.com/chroma-core/chroma/actions/workflows/chroma-integration-test.yml/badge.svg?branch=main" alt="Integration Tests">
</a> |
<a href="https://github.com/chroma-core/chroma/actions/workflows/chroma-test.yml" target="_blank">
<img src="https://github.com/chroma-core/chroma/actions/workflows/chroma-test.yml/badge.svg?branch=main" alt="Tests">
</a>
</p>
```bash
pip install chromadb # python client
# for javascript, npm install chromadb!
# for client-server mode, chroma run --path /chroma_db_path
```
The core API is only 4 functions (run our [ð¡ Google Colab](https://colab.research.google.com/drive/1QEzFyqnoFxq7LUGyP1vzR4iLt9PpCDXv?usp=sharing) or [Replit template](https://replit.com/@swyx/BasicChromaStarter?v=1)):
```python
import chromadb
# setup Chroma in-memory, for easy prototyping. Can add persistence easily!
client = chromadb.Client()
# Create collection. get_collection, get_or_create_collection, delete_collection also available!
collection = client.create_collection("all-my-documents")
# Add docs to the collection. Can also update and delete. Row-based API coming soon!
collection.add(
documents=["This is document1", "This is document2"], # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well
metadatas=[{"source": "notion"}, {"source": "google-docs"}], # filter on these!
ids=["doc1", "doc2"], # unique for each doc
)
# Query/search 2 most similar results. You can also .get by id
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
# where={"metadata_field": "is_equal_to_this"}, # optional filter
# where_document={"$contains":"search_string"} # optional filter
)
```
## Features
- __Simple__: Fully-typed, fully-tested, fully-documented == happiness
- __Integrations__: [`ð¦ï¸ð LangChain`](https://blog.langchain.dev/langchain-chroma/) (python and js), [`ð¦ LlamaIndex`](https://twitter.com/atroyn/status/1628557389762007040) and more soon
- __Dev, Test, Prod__: the same API that runs in your python notebook, scales to your cluster
- __Feature-rich__: Queries, filtering, density estimation and more
- __Free & Open Source__: Apache 2.0 Licensed
## Use case: ChatGPT for ______
For example, the `"Chat your data"` use case:
1. Add documents to your database. You can pass in your own embeddings, embedding function, or let Chroma embed them for you.
2. Query relevant documents with natural language.
3. Compose documents into the context window of an LLM like `GPT3` for additional summarization or analysis.
## Embeddings?
What are embeddings?
- [Read the guide from OpenAI](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)
- __Literal__: Embedding something turns it from image/text/audio into a list of numbers. ð¼ï¸ or ð => `[1.2, 2.1, ....]`. This process makes documents "understandable" to a machine learning model.
- __By analogy__: An embedding represents the essence of a document. This enables documents and queries with the same essence to be "near" each other and therefore easy to find.
- __Technical__: An embedding is the latent-space position of a document at a layer of a deep neural network. For models trained specifically to embed data, this is the last layer.
- __A small example__: If you search your photos for "famous bridge in San Francisco". By embedding this query and comparing it to the embeddings of your photos and their metadata - it should return photos of the Golden Gate Bridge.
Embeddings databases (also known as **vector databases**) store embeddings and allow you to search by nearest neighbors rather than by substrings like a traditional database. By default, Chroma uses [Sentence Transformers](https://docs.trychroma.com/embeddings#sentence-transformers) to embed for you but you can also use OpenAI embeddings, Cohere (multilingual) embeddings, or your own.
## Get involved
Chroma is a rapidly developing project. We welcome PR contributors and ideas for how to improve the project.
- [Join the conversation on Discord](https://discord.gg/MMeYNTmh3x) - `#contributing` channel
- [Review the ð£ï¸ Roadmap and contribute your ideas](https://docs.trychroma.com/roadmap)
- [Grab an issue and open a PR](https://github.com/chroma-core/chroma/issues) - [`Good first issue tag`](https://github.com/chroma-core/chroma/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
- [Read our contributing guide](https://docs.trychroma.com/contributing)
**Release Cadence**
We currently release new tagged versions of the `pypi` and `npm` packages on Mondays. Hotfixes go out at any time during the week.
## License
[Apache 2.0](./LICENSE)
没有合适的资源?快使用搜索试试~ 我知道了~
Chroma嵌入式数据库:Chroma通过为LLM提供可插入的知识、事实和技能,使构建LLM应用程序变得容易
共580个文件
py:144个
go:104个
rs:86个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 129 浏览量
2024-03-26
09:52:50
上传
评论
收藏 13.3MB ZIP 举报
温馨提示
Chroma是嵌入数据库。Chroma通过为LLM提供可插入的知识、事实和技能,使构建LLM应用程序变得容易。
资源推荐
资源详情
资源评论
收起资源包目录
Chroma嵌入式数据库:Chroma通过为LLM提供可插入的知识、事实和技能,使构建LLM应用程序变得容易 (580个子文件)
openssl.cnf 188B
bindings.cpp 6KB
Dockerfile 2KB
Dockerfile 1KB
Dockerfile 442B
Dockerfile 180B
.dockerignore 91B
.dockerignore 35B
.gitattributes 63B
.gitignore 505B
.gitignore 66B
.gitignore 9B
coordinator.pb.go 111KB
chroma.pb.go 58KB
apis_test.go 34KB
coordinator_grpc.pb.go 27KB
logservice.pb.go 25KB
table_catalog.go 22KB
collection_service_test.go 12KB
collection_service.go 9KB
record_log_service_test.go 9KB
proto_model_convert.go 8KB
database_notification_store_test.go 8KB
logservice_grpc.pb.go 7KB
proto_model_convert_test.go 7KB
apis.go 7KB
collection.go 7KB
model_db_convert.go 7KB
model_db_convert_test.go 7KB
server.go 7KB
Catalog.go 6KB
record_log_test.go 6KB
segment.go 6KB
memberlist_manager_test.go 6KB
chroma_grpc.pb.go 5KB
segment_test.go 5KB
memory_notification_store_test.go 5KB
node_watcher.go 5KB
core.go 5KB
collection_test.go 5KB
segment_service.go 5KB
table_catalog_test.go 5KB
ICollectionDb.go 5KB
notification_processor_test.go 4KB
test_utils.go 4KB
record_log_service.go 4KB
main.go 4KB
tenant_database_service.go 4KB
notification_processor.go 4KB
record_log.go 4KB
IMetaDomain.go 4KB
tenant_database_service_test.go 4KB
memberlist_manager.go 4KB
memberlist_store.go 3KB
database_notification_store.go 3KB
ISegmentDb.go 3KB
cmd.go 3KB
INotificationDb.go 3KB
tenant_test.go 3KB
tenant.go 3KB
notifier.go 2KB
IDatabaseDb.go 2KB
server.go 2KB
ITenantDb.go 2KB
service.go 2KB
catalog.go 2KB
collection_metadata.go 2KB
errors.go 2KB
ICollectionMetadataDb.go 2KB
coordinator.go 2KB
memory_notification_store.go 2KB
assignment_policy.go 2KB
ISegmentMetadataDb.go 2KB
segment.go 2KB
database.go 2KB
collection.go 2KB
segment.go 2KB
collection.go 1KB
rendezvous_hash.go 1KB
record_log_test_util.go 1KB
pulsar_admin.go 1KB
cmd.go 1KB
rendezvous_hash_test.go 1KB
response.go 1KB
common.go 1KB
segment_metadata.go 1KB
notification.go 1KB
apis.go 1KB
kubernetes.go 1KB
ITransaction.go 1KB
tenant.go 1016B
log.go 992B
database.go 988B
segment_metadata.go 985B
types.go 967B
segment_metadata.go 953B
collection_metadata.go 940B
collection_metadata.go 872B
integration.go 818B
config_test.go 723B
共 580 条
- 1
- 2
- 3
- 4
- 5
- 6
资源评论
UnknownToKnown
- 粉丝: 1w+
- 资源: 590
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功