# ChatGPT Retrieval Plugin
Build Custom GPTs with a Retrieval Plugin backend to give ChatGPT access to personal documents.
![Example Custom GPT Screenshot](/assets/example.png)
## Introduction
The ChatGPT Retrieval Plugin repository provides a flexible solution for semantic search and retrieval of personal or organizational documents using natural language queries. It is a standalone retrieval backend, and can be used with [ChatGPT custom GPTs](https://chat.openai.com/gpts/discovery), [function calling](https://platform.openai.com/docs/guides/function-calling) with the [chat completions](https://platform.openai.com/docs/guides/text-generation) or [assistants APIs](https://platform.openai.com/docs/assistants/overview), or with the [ChatGPT plugins model (deprecated)](https://chat.openai.com/?model=gpt-4-plugins). ChatGPT and the Assistants API both natively support retrieval from uploaded files, so you should use the Retrieval Plugin as a backend only if you want more granular control of your retrieval system (e.g. document text chunk length, embedding model / size, etc.).
The repository is organized into several directories:
| Directory | Description |
| ------------------------------- | -------------------------------------------------------------------------------------------------------------------------- |
| [`datastore`](/datastore) | Contains the core logic for storing and querying document embeddings using various vector database providers. |
| [`docs`](/docs) | Includes documentation for setting up and using each vector database provider, webhooks, and removing unused dependencies. |
| [`examples`](/examples) | Provides example configurations, authentication methods, and provider-specific examples. |
| [`local_server`](/local_server) | Contains an implementation of the Retrieval Plugin configured for localhost testing. |
| [`models`](/models) | Contains the data models used by the plugin, such as document and metadata models. |
| [`scripts`](/scripts) | Offers scripts for processing and uploading documents from different data sources. |
| [`server`](/server) | Houses the main FastAPI server implementation. |
| [`services`](/services) | Contains utility services for tasks like chunking, metadata extraction, and PII detection. |
| [`tests`](/tests) | Includes integration tests for various vector database providers. |
| [`.well-known`](/.well-known) | Stores the plugin manifest file and OpenAPI schema, which define the plugin configuration and API specification. |
This README provides detailed information on how to set up, develop, and deploy the ChatGPT Retrieval Plugin (stand-alone retrieval backend).
## Table of Contents
- [Quickstart](#quickstart)
- [About](#about)
- [Retrieval Plugin](#retrieval-plugin)
- [Retrieval Plugin with custom GPTs](#retrieval-plugin-with-custom-gpts)
- [Retrieval Plugin with function calling](#retrieval-plugin-with-function-calling)
- [Retrieval Plugin with the plugins model (deprecated)](#chatgpt-plugins-model)
- [API Endpoints](#api-endpoints)
- [Memory Feature](#memory-feature)
- [Security](#security)
- [Choosing an Embeddings Model](#choosing-an-embeddings-model)
- [Development](#development)
- [Setup](#setup)
- [General Environment Variables](#general-environment-variables)
- [Choosing a Vector Database](#choosing-a-vector-database)
- [Pinecone](#pinecone)
- [Elasticsearch](#elasticsearch)
- [Weaviate](#weaviate)
- [Zilliz](#zilliz)
- [Milvus](#milvus)
- [Qdrant](#qdrant)
- [Redis](#redis)
- [Llama Index](#llamaindex)
- [Chroma](#chroma)
- [Azure Cognitive Search](#azure-cognitive-search)
- [Azure CosmosDB Mongo vCore](#azure-cosmosdb-mongo-vcore)
- [Supabase](#supabase)
- [Postgres](#postgres)
- [AnalyticDB](#analyticdb)
- [Running the API Locally](#running-the-api-locally)
- [Personalization](#personalization)
- [Authentication Methods](#authentication-methods)
- [Deployment](#deployment)
- [Webhooks](#webhooks)
- [Scripts](#scripts)
- [Limitations](#limitations)
- [Contributors](#contributors)
- [Future Directions](#future-directions)
## Quickstart
Follow these steps to quickly set up and run the ChatGPT Retrieval Plugin:
1. Install Python 3.10, if not already installed.
2. Clone the repository: `git clone https://github.com/openai/chatgpt-retrieval-plugin.git`
3. Navigate to the cloned repository directory: `cd /path/to/chatgpt-retrieval-plugin`
4. Install poetry: `pip install poetry`
5. Create a new virtual environment with Python 3.10: `poetry env use python3.10`
6. Activate the virtual environment: `poetry shell`
7. Install app dependencies: `poetry install`
8. Create a [bearer token](#general-environment-variables)
9. Set the required environment variables:
```
export DATASTORE=<your_datastore>
export BEARER_TOKEN=<your_bearer_token>
export OPENAI_API_KEY=<your_openai_api_key>
export EMBEDDING_DIMENSION=256 # edit this value based on the dimension of the embeddings you want to use
export EMBEDDING_MODEL=text-embedding-3-large # edit this based on your model preference, e.g. text-embedding-3-small, text-embedding-ada-002
# Optional environment variables used when running Azure OpenAI
export OPENAI_API_BASE=https://<AzureOpenAIName>.openai.azure.com/
export OPENAI_API_TYPE=azure
export OPENAI_EMBEDDINGMODEL_DEPLOYMENTID=<Name of embedding model deployment>
export OPENAI_METADATA_EXTRACTIONMODEL_DEPLOYMENTID=<Name of deployment of model for metatdata>
export OPENAI_COMPLETIONMODEL_DEPLOYMENTID=<Name of general model deployment used for completion>
export OPENAI_EMBEDDING_BATCH_SIZE=<Batch size of embedding, for AzureOAI, this value need to be set as 1>
# Add the environment variables for your chosen vector DB.
# Some of these are optional; read the provider's setup docs in /docs/providers for more information.
# Pinecone
export PINECONE_API_KEY=<your_pinecone_api_key>
export PINECONE_ENVIRONMENT=<your_pinecone_environment>
export PINECONE_INDEX=<your_pinecone_index>
# Weaviate
export WEAVIATE_URL=<your_weaviate_instance_url>
export WEAVIATE_API_KEY=<your_api_key_for_WCS>
export WEAVIATE_CLASS=<your_optional_weaviate_class>
# Zilliz
export ZILLIZ_COLLECTION=<your_zilliz_collection>
export ZILLIZ_URI=<your_zilliz_uri>
export ZILLIZ_USER=<your_zilliz_username>
export ZILLIZ_PASSWORD=<your_zilliz_password>
# Milvus
export MILVUS_COLLECTION=<your_milvus_collection>
export MILVUS_HOST=<your_milvus_host>
export MILVUS_PORT=<your_milvus_port>
export MILVUS_USER=<your_milvus_username>
export MILVUS_PASSWORD=<your_milvus_password>
# Qdrant
export QDRANT_URL=<your_qdrant_url>
export QDRANT_PORT=<your_qdrant_port>
export QDRANT_GRPC_PORT=<your_qdrant_grpc_port>
export QDRANT_API_KEY=<your_qdrant_api_key>
export QDRANT_COLLECTION=<your_qdrant_collection>
# AnalyticDB
export PG_HOST=<your_analyticdb_host>
export PG_PORT=<your_analyticdb_port>
export PG_USER=<your_analyticdb_username>
export PG_PASSWORD=<your_analyticdb_password>
export PG_DATABASE=<your_analyticdb_database>
export PG_COLLECTION=<your_analyticdb_collection>
# Redis
export REDIS_HOST=<your_redis_host>
export REDIS_PORT=<your_redis_port>
export REDIS_PASSWORD=<your_redis_password>
export REDIS_INDEX_NAME=<your_redis_index_name>
export REDIS_D
没有合适的资源?快使用搜索试试~ 我知道了~
基于 ChatGPT 访问个人文档理解以及私人助手搭建.zip
共109个文件
py:48个
md:28个
json:9个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 135 浏览量
2024-03-23
13:29:39
上传
评论
收藏 903KB ZIP 举报
温馨提示
chatGPT是未来的必备工具,本项目是基于ChatGPT的二次开发也可以基于私有部署的模型来开发。内含源码和环境搭建教程。代码注释清晰
资源推荐
资源详情
资源评论
收起资源包目录
基于 ChatGPT 访问个人文档理解以及私人助手搭建.zip (109个子文件)
Dockerfile 584B
.gitignore 27B
semantic-search.ipynb 40KB
search.ipynb 29KB
semantic-search-and-filter.ipynb 19KB
semantic-search.ipynb 7KB
ai-plugin.json 1KB
documents.json 1KB
ai-plugin.json 870B
ai-plugin.json 868B
ai-plugin.json 789B
ai-plugin.json 712B
ai-plugin.json 706B
example.json 630B
queries.json 86B
example.jsonl 879B
poetry.lock 414KB
Makefile 357B
README.md 55KB
README.md 16KB
setup.md 6KB
setup.md 6KB
setup.md 5KB
setup.md 4KB
setup.md 4KB
setup.md 4KB
setup.md 3KB
setup.md 3KB
plugins.md 3KB
heroku.md 3KB
setup.md 3KB
flyio.md 3KB
removing-unused-dependencies.md 3KB
README.md 3KB
setup.md 3KB
README.md 3KB
setup.md 3KB
README.md 3KB
setup.md 3KB
other-options.md 3KB
setup.md 2KB
setup.md 2KB
render.md 2KB
README.md 1KB
README.md 1KB
README.md 107B
example.png 311KB
render-thumbnail.png 254KB
logo.png 17KB
milvus_datastore.py 25KB
test_weaviate_datastore.py 17KB
azuresearch_datastore.py 15KB
redis_datastore.py 14KB
weaviate_datastore.py 13KB
elasticsearch_datastore.py 13KB
pinecone_datastore.py 11KB
analyticdb_datastore.py 11KB
qdrant_datastore.py 11KB
test_milvus_datastore.py 11KB
azurecosmosdb_datastore.py 10KB
test_analyticdb_datastore.py 10KB
test_chroma_datastore.py 9KB
chroma_datastore.py 9KB
test_supabase_datastore.py 9KB
test_postgres_datastore.py 9KB
test_azuresearch_datastore.py 9KB
test_qdrant_datastore.py 8KB
chunks.py 7KB
llama_datastore.py 7KB
pgvector_datastore.py 7KB
main.py 6KB
test_azurecosmosdb_datastore.py 6KB
process_zip.py 6KB
process_json.py 5KB
process_jsonl.py 5KB
postgres_datastore.py 5KB
main.py 5KB
test_elasticsearch_datastore.py 5KB
main.py 4KB
main.py 4KB
file.py 4KB
supabase_datastore.py 3KB
datastore.py 3KB
test_llama_datastore.py 3KB
factory.py 3KB
zilliz_datastore.py 3KB
openai.py 2KB
test_redis_datastore.py 2KB
extract_metadata.py 1KB
models.py 1KB
pii_detection.py 1KB
date.py 825B
test_zilliz_datastore.py 818B
api.py 620B
__init__.py 0B
__init__.py 0B
__init__.py 0B
20230414142107_init_pg_vector.sql 3KB
seed.sql 0B
config.toml 3KB
共 109 条
- 1
- 2
资源评论
小码蚁.
- 粉丝: 2525
- 资源: 4057
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- tensorflow-gpu-2.5.0-cp39-cp39-manylinux2010-x86-64.whl
- tensorflow-gpu-2.5.2-cp39-cp39-manylinux2010-x86-64.whl
- 内含方正小标宋简体、仿宋-Gb2312、黑体、楷体、宋体,五个公文常用字体
- 记忆卡牌游戏源码及可运行文件
- 利用wps的js宏编写的一键格式修改辅助工具
- 基于matlab实现训练RBF网络的,但用的算法是梯度下降法,算法仍然是自己写的.rar
- 基于matlab实现小波分析改造后,可以分析脑电数据的程序,出现32个导联每个通道的功率谱.rar
- 基于matlab实现物体的应力和应变DIC-通过识别一系列图像的变形得到物体的应力和应变
- 基于matlab实现文档+程序NSGA-II多目标优化的matlab代码.rar
- 基于matlab实现文档+程序 多目标优化,NSGA2算法实现.rar
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功