基于Python实现的高效文本转向量(Text-To-Vector)服务，支持GPU多卡、多worker、多客户端调用，开箱即用

共38个文件

py：19个

md：5个

yml：3个

版权申诉

160 浏览量 2023-09-18 10:35:29 上传评论收藏 312KB ZIP 举报

在IT领域，文本转向量（Text-to-Vector）服务是一种将自然语言文本转换为数值向量的技术，广泛应用于机器学习和深度学习模型中，如文本分类、情感分析、语义理解等任务。本项目是基于Python实现的一个高效、可扩展的解决方案，特别适合大规模文本处理。以下是关于这个服务的详细知识点： 1. **Python编程语言**：Python是用于开发此类服务的首选语言，因为它具有丰富的库和工具，如Numpy、Pandas、Scikit-learn和TensorFlow等，这些库对于处理数据和构建深度学习模型非常方便。 2. **文本预处理**：在将文本转化为向量之前，通常需要进行预处理，包括分词、去除停用词、词干提取、词形还原等步骤，确保文本数据适合进一步的计算。 3. **词嵌入（Word Embeddings）**：如Word2Vec、GloVe或FastText等技术，它们能将每个单词映射到一个多维空间中的向量，使得语义相似的词在该空间中距离较近。 4. **Transformer模型**：近年来，Transformer模型，特别是其变种BERT、GPT等，已经在文本向量化中取得了重大突破，通过上下文学习，可以捕获更复杂的语义信息。 5. **GPU加速**：由于计算密集型，使用GPU可以显著提高文本向量化速度。这个服务支持GPU多卡，意味着可以利用多个GPU并行计算，提升整体性能。 6. **分布式架构**：服务支持多worker设置，这意味着它可以分布到多台机器上，进一步提升处理能力。这在处理大量数据或需要实时响应的场景中尤为重要。 7. **C/S模式**：客户端/服务器（Client/Server）模式允许多个客户端并发地向服务发送请求，获取文本的向量表示，这种设计提高了系统的并发性和扩展性。 8. **多客户端调用**：服务能够处理来自多个客户端的请求，这适用于需要为不同用户提供文本处理服务的应用场景，如在线教育、智能客服、社交媒体分析等。 9. **开箱即用**：项目的易用性是其特点之一，用户无需进行复杂的配置或代码编写，只需按照文档说明即可快速部署和使用。 10. **服务化封装**：将文本向量化功能封装成服务，使得开发者可以专注于应用逻辑，而不必关心底层的计算细节，降低了开发复杂度。这个基于Python的文本转向量服务是一个强大且灵活的工具，它结合了现代自然语言处理技术与高性能计算能力，旨在满足对大规模文本数据处理的需求。无论是学术研究还是工业应用，都能从中受益。

资源推荐

资源详情

资源评论

收起资源包目录

text2vec-service-main.zip （38个子文件）

text2vec-service-main

setup.py 2KB

.github

ISSUE_TEMPLATE

usage-question.md 379B

feature-request.md 790B

bug-report.md 646B

workflows

ubuntu.yml 2KB

windows.yml 2KB

stale.yml 766B

LICENSE 11KB

tests

test_qps.py 1KB

docker

entrypoint.sh 64B

Dockerfile 200B

CONTRIBUTING.md 531B

examples

base_demo.py 455B

sync_demo.py 889B

async_demo.py 2KB

multicast_demo.py 812B

remote_demo.py 514B

http_demo.py 1KB

similarity_search_demo.py 1KB

CITATION.cff 332B

docs

wechat.jpeg 40KB

dashboard.png 294KB

service

__init__.py 257B

version.py 102B

client

__init__.py 18KB

server

http_proxy.py 2KB

__init__.py 23KB

helper.py 10KB

benchmark.py 3KB

model.py 444B

zmq_decor.py 2KB

cli.py 731B

plugin

dashboard

main.css 849B

bindings.js 5KB

index.html 6KB

requirements.txt 62B

.gitignore 2KB

README.md 12KB

# text2vec-service Bert model to vector service. **text2vec-service**搭建了一个高效的文本转向量(Text-To-Vector)服务。 # Feature BERT service with C/S. # Install ```shell pip install torch # conda install pytorch pip install -U text2vec-service ``` or ```shell pip install torch # conda install pytorch pip install -r requirements.txt cd text2vec-service pip install --no-deps . ``` # Usage #### 1. Start the BERT service After installing the server, you should be able to use `service-server-start` CLI as follows: ```bash service-server-start -model_dir shibing624/text2vec-base-chinese ``` This will start a service with four workers, meaning that it can handle up to four **concurrent** requests. More concurrent requests will be queued in a load balancer. <details> <summary>Alternatively, one can start the BERT Service in a Docker Container (click to expand...)</summary> ```bash docker build -t text2vec-service -f ./docker/Dockerfile . NUM_WORKER=1 PATH_MODEL=/PATH_TO/_YOUR_MODEL/ docker run --runtime nvidia -dit -p 5555:5555 -p 5556:5556 -v $PATH_MODEL:/model -t text2vec-service $NUM_WORKER ``` </details> #### 2. Use Client to Get Sentence Encodes Now you can encode sentences simply as follows: ```python from service.client import BertClient bc = BertClient() bc.encode(['如何更换花呗绑定银行卡', '花呗更改绑定银行卡']) ``` It will return a `ndarray` (or `List[List[float]]` if you wish), in which each row is a fixed-length vector representing a sentence. Having thousands of sentences? Just `encode`! *Don't even bother to batch*, the server will take care of it. #### Use BERT Service Remotely One may also start the service on one (GPU) machine and call it from another (CPU) machine as follows: ```python # on another CPU machine from service.client import BertClient bc = BertClient(ip='xx.xx.xx.xx') # ip address of the GPU machine bc.encode(['如何更换花呗绑定银行卡', '花呗更改绑定银行卡']) ``` <h2 align="center">Server and Client API</h2> <p align="right"><a href="#text2vec-service"><sup>▴ Back to top</sup></a></p> ### Server API ```bash service-server-start --help service-server-terminate --help service-server-benchmark --help ``` | Argument | Type | Default | Description | |--------------------|------|-------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------| | `model_dir` | str | *Required* | folder path of the pre-trained BERT model. | | `max_seq_len` | int | `25` | maximum length of sequence, longer sequence will be trimmed on the right side. Set it to NONE for dynamically using the longest sequence in a (mini)batch. | | `cased_tokenization` | bool | False | Whether tokenizer should skip the default lowercasing and accent removal. Should be used for e.g. the multilingual cased pretrained BERT model. | | `num_worker` | int | `1` | number of (GPU/CPU) worker runs BERT model, each works in a separate process. | | `max_batch_size` | int | `256` | maximum number of sequences handled by each worker, larger batch will be partitioned into small batches. | | `priority_batch_size` | int | `16` | batch smaller than this size will be labeled as high priority, and jumps forward in the job queue to get result faster | | `port` | int | `5555` | port for pushing data from client to server | | `port_out` | int | `5556`| port for publishing results from server to client | | `http_port` | int | None | server port for receiving HTTP requests | | `cors` | str | `*` | setting "Access-Control-Allow-Origin" for HTTP requests | | `gpu_memory_fraction` | float | `0.5` | the fraction of the overall amount of memory that each GPU should be allocated per worker | | `cpu` | bool | False | run on CPU instead of GPU | | `xla` | bool | False | enable [XLA compiler](https://www.tensorflow.org/xla/jit) for graph optimization (*experimental!*) | | `fp16` | bool | False | use float16 precision (experimental) | | `device_map` | list | `[]` | specify the list of GPU device ids that will be used (id starts from 0)| ### Client API | Argument | Type | Default | Description | |----------------------|------|-----------|-------------------------------------------------------------------------------| | `ip` | str | `localhost` | IP address of the server | | `port` | int | `5555` | port for pushing data from client to server, *must be consistent with the server side config* | | `port_out` | int | `5556`| port for publishing results from server to client, *must be consistent with the server side config* | | `output_fmt` | str | `ndarray` | the output format of the sentence encodes, either in numpy array or python List[List[float]] (`ndarray`/`list`) | | `show_server_config` | bool | `False` | whether to show server configs when first connected | | `check_version` | bool | `True` | whether to force client and server to have the same version | | `identity` | str | `None` | a UUID that identifies the client, useful in multi-casting | | `timeout` | int | `-1` | set the timeout (milliseconds) for receive operation on the client | A `BertClient` implements the following methods and properties: | Method | Description | |--------|------| |`.encode()`|Encode a list of strings to a list of vectors| |`.encode_async()`|Asynchronous encode batches from a generator| |`.fetch()`|Fetch all encoded vectors from server and return them in a generator, use it with `.encode_async()` or `.encode(blocking=False)`. Sending order is NOT preserved.| |`.fetch_all()`|Fetch all encoded vectors from server and return them in a list, use it with `.encode_async()` or `.encode(blocking=False)`. Sending order is preserved.| |`.close()`|Gracefully close the connection between the client and the server| |`.status`|Get the client status in JSON format| |`.server_status`|Get the server status in JSON format| <h2 align="center">:book: Tutorial</h2> <p align="right"><a href="#text2vec-service"><sup>▴ Back to top</sup></a></p> The full list of examples can be found in [`examples/`](examples). You can run each via `python examples/base-demo.py`. ### Serving a fine-tuned BERT model Pretrained BERT models often show quite "okayish" performance on many tasks. However, to release the true power of BERT a fine-tuning on the downstream task (or on domain-specific data) is necessary. In this example, serve a fine-tuned BERT model. ```bash service-server-start -model_dir shibing624/bert-base-chinese ``` ### Asynchronous encoding > The complete example can be found [examples/async_demo.py](examples/async_demo.py). `BertClient.encode()` offers a nice synchronous way to get sentence encodes. However, sometimes we want to do it in an asynchronous manner by feeding all textual data to the server first, fetching the encoded results later. This can be easily done by: ```python # an endless data stream, generating data in an extremely fast speed def text_gen(): while True: yield lst_str # yield a batch of text lines bc = BertClient() # get encoded vectors for j in bc.encode_async(text_gen(), max_num_batch=10): print('received %d x %d' % (j.shape[0], j.shape[1])) ``` ### Broadcasting to multiple clients > example: [examples/multicast_demo.py](examples/multicast_demo.py). The encoded result is routed to the client according to its identity. If you have multiple clients with same identity, then they all receive the results! You can use this *multicast* feature to do some cool things, e.g. training multiple different models (some using `scikit-learn` some using `pytorch`) in multiple separated processes while only call `BertServer` once. In the example below, `bc` and its two clones will all receive encoded vector. ```python # clone a client by reusing the identity def client_clone(id, idx): bc = BertClient(identity=id) for j in bc.listen(): print('clone-client-%d: received %d x %d' % (idx

评论收藏

内容反馈

版权申诉