# text2vec-service
Bert model to vector service.
**text2vec-service**搭建了一个高效的文本转向量(Text-To-Vector)服务。
# Feature
BERT service with C/S.
# Install
```shell
pip install torch # conda install pytorch
pip install -U text2vec-service
```
or
```shell
pip install torch # conda install pytorch
pip install -r requirements.txt
cd text2vec-service
pip install --no-deps .
```
# Usage
#### 1. Start the BERT service
After installing the server, you should be able to use `service-server-start` CLI as follows:
```bash
service-server-start -model_dir shibing624/text2vec-base-chinese
```
This will start a service with four workers, meaning that it can handle up to four **concurrent** requests.
More concurrent requests will be queued in a load balancer.
<details>
<summary>Alternatively, one can start the BERT Service in a Docker Container (click to expand...)</summary>
```bash
docker build -t text2vec-service -f ./docker/Dockerfile .
NUM_WORKER=1
PATH_MODEL=/PATH_TO/_YOUR_MODEL/
docker run --runtime nvidia -dit -p 5555:5555 -p 5556:5556 -v $PATH_MODEL:/model -t text2vec-service $NUM_WORKER
```
</details>
#### 2. Use Client to Get Sentence Encodes
Now you can encode sentences simply as follows:
```python
from service.client import BertClient
bc = BertClient()
bc.encode(['如何更换花呗绑定银行卡', '花呗更改绑定银行卡'])
```
It will return a `ndarray` (or `List[List[float]]` if you wish), in which each row is a fixed-length vector
representing a sentence. Having thousands of sentences? Just `encode`! *Don't even bother to batch*,
the server will take care of it.
#### Use BERT Service Remotely
One may also start the service on one (GPU) machine and call it from another (CPU) machine as follows:
```python
# on another CPU machine
from service.client import BertClient
bc = BertClient(ip='xx.xx.xx.xx') # ip address of the GPU machine
bc.encode(['如何更换花呗绑定银行卡', '花呗更改绑定银行卡'])
```
<h2 align="center">Server and Client API</h2>
<p align="right"><a href="#text2vec-service"><sup>▴ Back to top</sup></a></p>
### Server API
```bash
service-server-start --help
service-server-terminate --help
service-server-benchmark --help
```
| Argument | Type | Default | Description |
|--------------------|------|-------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `model_dir` | str | *Required* | folder path of the pre-trained BERT model. |
| `max_seq_len` | int | `25` | maximum length of sequence, longer sequence will be trimmed on the right side. Set it to NONE for dynamically using the longest sequence in a (mini)batch. |
| `cased_tokenization` | bool | False | Whether tokenizer should skip the default lowercasing and accent removal. Should be used for e.g. the multilingual cased pretrained BERT model. |
| `num_worker` | int | `1` | number of (GPU/CPU) worker runs BERT model, each works in a separate process. |
| `max_batch_size` | int | `256` | maximum number of sequences handled by each worker, larger batch will be partitioned into small batches. |
| `priority_batch_size` | int | `16` | batch smaller than this size will be labeled as high priority, and jumps forward in the job queue to get result faster |
| `port` | int | `5555` | port for pushing data from client to server |
| `port_out` | int | `5556`| port for publishing results from server to client |
| `http_port` | int | None | server port for receiving HTTP requests |
| `cors` | str | `*` | setting "Access-Control-Allow-Origin" for HTTP requests |
| `gpu_memory_fraction` | float | `0.5` | the fraction of the overall amount of memory that each GPU should be allocated per worker |
| `cpu` | bool | False | run on CPU instead of GPU |
| `xla` | bool | False | enable [XLA compiler](https://www.tensorflow.org/xla/jit) for graph optimization (*experimental!*) |
| `fp16` | bool | False | use float16 precision (experimental) |
| `device_map` | list | `[]` | specify the list of GPU device ids that will be used (id starts from 0)|
### Client API
| Argument | Type | Default | Description |
|----------------------|------|-----------|-------------------------------------------------------------------------------|
| `ip` | str | `localhost` | IP address of the server |
| `port` | int | `5555` | port for pushing data from client to server, *must be consistent with the server side config* |
| `port_out` | int | `5556`| port for publishing results from server to client, *must be consistent with the server side config* |
| `output_fmt` | str | `ndarray` | the output format of the sentence encodes, either in numpy array or python List[List[float]] (`ndarray`/`list`) |
| `show_server_config` | bool | `False` | whether to show server configs when first connected |
| `check_version` | bool | `True` | whether to force client and server to have the same version |
| `identity` | str | `None` | a UUID that identifies the client, useful in multi-casting |
| `timeout` | int | `-1` | set the timeout (milliseconds) for receive operation on the client |
A `BertClient` implements the following methods and properties:
| Method | Description |
|--------|------|
|`.encode()`|Encode a list of strings to a list of vectors|
|`.encode_async()`|Asynchronous encode batches from a generator|
|`.fetch()`|Fetch all encoded vectors from server and return them in a generator, use it with `.encode_async()` or `.encode(blocking=False)`. Sending order is NOT preserved.|
|`.fetch_all()`|Fetch all encoded vectors from server and return them in a list, use it with `.encode_async()` or `.encode(blocking=False)`. Sending order is preserved.|
|`.close()`|Gracefully close the connection between the client and the server|
|`.status`|Get the client status in JSON format|
|`.server_status`|Get the server status in JSON format|
<h2 align="center">:book: Tutorial</h2>
<p align="right"><a href="#text2vec-service"><sup>▴ Back to top</sup></a></p>
The full list of examples can be found in [`examples/`](examples). You can run each via `python examples/base-demo.py`.
### Serving a fine-tuned BERT model
Pretrained BERT models often show quite "okayish" performance on many tasks. However, to release the true power of
BERT a fine-tuning on the downstream task (or on domain-specific data) is necessary.
In this example, serve a fine-tuned BERT model.
```bash
service-server-start -model_dir shibing624/bert-base-chinese
```
### Asynchronous encoding
> The complete example can be found [examples/async_demo.py](examples/async_demo.py).
`BertClient.encode()` offers a nice synchronous way to get sentence encodes.
However, sometimes we want to do it in an asynchronous manner by feeding all textual data to the server first,
fetching the encoded results later. This can be easily done by:
```python
# an endless data stream, generating data in an extremely fast speed
def text_gen():
while True:
yield lst_str # yield a batch of text lines
bc = BertClient()
# get encoded vectors
for j in bc.encode_async(text_gen(), max_num_batch=10):
print('received %d x %d' % (j.shape[0], j.shape[1]))
```
### Broadcasting to multiple clients
> example: [examples/multicast_demo.py](examples/multicast_demo.py).
The encoded result is routed to the client according to its identity. If you have multiple clients with
same identity, then they all receive the results! You can use this *multicast* feature to do some cool things,
e.g. training multiple different models (some using `scikit-learn` some using `pytorch`) in multiple
separated processes while only call `BertServer` once. In the example below, `bc` and its two clones will
all receive encoded vector.
```python
# clone a client by reusing the identity
def client_clone(id, idx):
bc = BertClient(identity=id)
for j in bc.listen():
print('clone-client-%d: received %d x %d' % (idx