基于ChatGPT访问个人文档理解以及私人助手搭建.zip资源-CSDN文库

共109个文件

py：48个

md：28个

json：9个

版权申诉

135 浏览量 2024-03-23 13:29:39 上传评论收藏 903KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

基于 ChatGPT 访问个人文档理解以及私人助手搭建.zip （109个子文件）

Dockerfile 584B

.gitignore 27B

semantic-search.ipynb 40KB

search.ipynb 29KB

semantic-search-and-filter.ipynb 19KB

semantic-search.ipynb 7KB

ai-plugin.json 1KB

documents.json 1KB

ai-plugin.json 870B

ai-plugin.json 868B

ai-plugin.json 789B

ai-plugin.json 712B

ai-plugin.json 706B

example.json 630B

queries.json 86B

example.jsonl 879B

poetry.lock 414KB

Makefile 357B

README.md 55KB

README.md 16KB

setup.md 6KB

setup.md 5KB

setup.md 4KB

setup.md 3KB

plugins.md 3KB

heroku.md 3KB

setup.md 3KB

flyio.md 3KB

removing-unused-dependencies.md 3KB

README.md 3KB

setup.md 3KB

README.md 3KB

setup.md 3KB

README.md 3KB

setup.md 3KB

other-options.md 3KB

setup.md 2KB

render.md 2KB

README.md 1KB

README.md 107B

example.png 311KB

render-thumbnail.png 254KB

logo.png 17KB

milvus_datastore.py 25KB

test_weaviate_datastore.py 17KB

azuresearch_datastore.py 15KB

redis_datastore.py 14KB

weaviate_datastore.py 13KB

elasticsearch_datastore.py 13KB

pinecone_datastore.py 11KB

analyticdb_datastore.py 11KB

qdrant_datastore.py 11KB

test_milvus_datastore.py 11KB

azurecosmosdb_datastore.py 10KB

test_analyticdb_datastore.py 10KB

test_chroma_datastore.py 9KB

chroma_datastore.py 9KB

test_supabase_datastore.py 9KB

test_postgres_datastore.py 9KB

test_azuresearch_datastore.py 9KB

test_qdrant_datastore.py 8KB

chunks.py 7KB

llama_datastore.py 7KB

pgvector_datastore.py 7KB

main.py 6KB

test_azurecosmosdb_datastore.py 6KB

process_zip.py 6KB

process_json.py 5KB

process_jsonl.py 5KB

postgres_datastore.py 5KB

main.py 5KB

test_elasticsearch_datastore.py 5KB

main.py 4KB

file.py 4KB

supabase_datastore.py 3KB

datastore.py 3KB

test_llama_datastore.py 3KB

factory.py 3KB

zilliz_datastore.py 3KB

openai.py 2KB

test_redis_datastore.py 2KB

extract_metadata.py 1KB

models.py 1KB

pii_detection.py 1KB

date.py 825B

test_zilliz_datastore.py 818B

api.py 620B

__init__.py 0B

20230414142107_init_pg_vector.sql 3KB

seed.sql 0B

config.toml 3KB

共 109 条

# ChatGPT Retrieval Plugin Build Custom GPTs with a Retrieval Plugin backend to give ChatGPT access to personal documents. ![Example Custom GPT Screenshot](/assets/example.png) ## Introduction The ChatGPT Retrieval Plugin repository provides a flexible solution for semantic search and retrieval of personal or organizational documents using natural language queries. It is a standalone retrieval backend, and can be used with [ChatGPT custom GPTs](https://chat.openai.com/gpts/discovery), [function calling](https://platform.openai.com/docs/guides/function-calling) with the [chat completions](https://platform.openai.com/docs/guides/text-generation) or [assistants APIs](https://platform.openai.com/docs/assistants/overview), or with the [ChatGPT plugins model (deprecated)](https://chat.openai.com/?model=gpt-4-plugins). ChatGPT and the Assistants API both natively support retrieval from uploaded files, so you should use the Retrieval Plugin as a backend only if you want more granular control of your retrieval system (e.g. document text chunk length, embedding model / size, etc.). The repository is organized into several directories: | Directory | Description | | ------------------------------- | -------------------------------------------------------------------------------------------------------------------------- | | [`datastore`](/datastore) | Contains the core logic for storing and querying document embeddings using various vector database providers. | | [`docs`](/docs) | Includes documentation for setting up and using each vector database provider, webhooks, and removing unused dependencies. | | [`examples`](/examples) | Provides example configurations, authentication methods, and provider-specific examples. | | [`local_server`](/local_server) | Contains an implementation of the Retrieval Plugin configured for localhost testing. | | [`models`](/models) | Contains the data models used by the plugin, such as document and metadata models. | | [`scripts`](/scripts) | Offers scripts for processing and uploading documents from different data sources. | | [`server`](/server) | Houses the main FastAPI server implementation. | | [`services`](/services) | Contains utility services for tasks like chunking, metadata extraction, and PII detection. | | [`tests`](/tests) | Includes integration tests for various vector database providers. | | [`.well-known`](/.well-known) | Stores the plugin manifest file and OpenAPI schema, which define the plugin configuration and API specification. | This README provides detailed information on how to set up, develop, and deploy the ChatGPT Retrieval Plugin (stand-alone retrieval backend). ## Table of Contents - [Quickstart](#quickstart) - [About](#about) - [Retrieval Plugin](#retrieval-plugin) - [Retrieval Plugin with custom GPTs](#retrieval-plugin-with-custom-gpts) - [Retrieval Plugin with function calling](#retrieval-plugin-with-function-calling) - [Retrieval Plugin with the plugins model (deprecated)](#chatgpt-plugins-model) - [API Endpoints](#api-endpoints) - [Memory Feature](#memory-feature) - [Security](#security) - [Choosing an Embeddings Model](#choosing-an-embeddings-model) - [Development](#development) - [Setup](#setup) - [General Environment Variables](#general-environment-variables) - [Choosing a Vector Database](#choosing-a-vector-database) - [Pinecone](#pinecone) - [Elasticsearch](#elasticsearch) - [Weaviate](#weaviate) - [Zilliz](#zilliz) - [Milvus](#milvus) - [Qdrant](#qdrant) - [Redis](#redis) - [Llama Index](#llamaindex) - [Chroma](#chroma) - [Azure Cognitive Search](#azure-cognitive-search) - [Azure CosmosDB Mongo vCore](#azure-cosmosdb-mongo-vcore) - [Supabase](#supabase) - [Postgres](#postgres) - [AnalyticDB](#analyticdb) - [Running the API Locally](#running-the-api-locally) - [Personalization](#personalization) - [Authentication Methods](#authentication-methods) - [Deployment](#deployment) - [Webhooks](#webhooks) - [Scripts](#scripts) - [Limitations](#limitations) - [Contributors](#contributors) - [Future Directions](#future-directions) ## Quickstart Follow these steps to quickly set up and run the ChatGPT Retrieval Plugin: 1. Install Python 3.10, if not already installed. 2. Clone the repository: `git clone https://github.com/openai/chatgpt-retrieval-plugin.git` 3. Navigate to the cloned repository directory: `cd /path/to/chatgpt-retrieval-plugin` 4. Install poetry: `pip install poetry` 5. Create a new virtual environment with Python 3.10: `poetry env use python3.10` 6. Activate the virtual environment: `poetry shell` 7. Install app dependencies: `poetry install` 8. Create a [bearer token](#general-environment-variables) 9. Set the required environment variables: ``` export DATASTORE=<your_datastore> export BEARER_TOKEN=<your_bearer_token> export OPENAI_API_KEY=<your_openai_api_key> export EMBEDDING_DIMENSION=256 # edit this value based on the dimension of the embeddings you want to use export EMBEDDING_MODEL=text-embedding-3-large # edit this based on your model preference, e.g. text-embedding-3-small, text-embedding-ada-002 # Optional environment variables used when running Azure OpenAI export OPENAI_API_BASE=https://<AzureOpenAIName>.openai.azure.com/ export OPENAI_API_TYPE=azure export OPENAI_EMBEDDINGMODEL_DEPLOYMENTID=<Name of embedding model deployment> export OPENAI_METADATA_EXTRACTIONMODEL_DEPLOYMENTID=<Name of deployment of model for metatdata> export OPENAI_COMPLETIONMODEL_DEPLOYMENTID=<Name of general model deployment used for completion> export OPENAI_EMBEDDING_BATCH_SIZE=<Batch size of embedding, for AzureOAI, this value need to be set as 1> # Add the environment variables for your chosen vector DB. # Some of these are optional; read the provider's setup docs in /docs/providers for more information. # Pinecone export PINECONE_API_KEY=<your_pinecone_api_key> export PINECONE_ENVIRONMENT=<your_pinecone_environment> export PINECONE_INDEX=<your_pinecone_index> # Weaviate export WEAVIATE_URL=<your_weaviate_instance_url> export WEAVIATE_API_KEY=<your_api_key_for_WCS> export WEAVIATE_CLASS=<your_optional_weaviate_class> # Zilliz export ZILLIZ_COLLECTION=<your_zilliz_collection> export ZILLIZ_URI=<your_zilliz_uri> export ZILLIZ_USER=<your_zilliz_username> export ZILLIZ_PASSWORD=<your_zilliz_password> # Milvus export MILVUS_COLLECTION=<your_milvus_collection> export MILVUS_HOST=<your_milvus_host> export MILVUS_PORT=<your_milvus_port> export MILVUS_USER=<your_milvus_username> export MILVUS_PASSWORD=<your_milvus_password> # Qdrant export QDRANT_URL=<your_qdrant_url> export QDRANT_PORT=<your_qdrant_port> export QDRANT_GRPC_PORT=<your_qdrant_grpc_port> export QDRANT_API_KEY=<your_qdrant_api_key> export QDRANT_COLLECTION=<your_qdrant_collection> # AnalyticDB export PG_HOST=<your_analyticdb_host> export PG_PORT=<your_analyticdb_port> export PG_USER=<your_analyticdb_username> export PG_PASSWORD=<your_analyticdb_password> export PG_DATABASE=<your_analyticdb_database> export PG_COLLECTION=<your_analyticdb_collection> # Redis export REDIS_HOST=<your_redis_host> export REDIS_PORT=<your_redis_port> export REDIS_PASSWORD=<your_redis_password> export REDIS_INDEX_NAME=<your_redis_index_name> export REDIS_D

评论收藏

内容反馈

版权申诉