# Text-Classification-using-Paddle
![example](imgs/example.gif)
## ð¨ Language
* [English](/README.md)
* [ä¸æ](/README-zh.md)
## ð Description
This code is used `Paddle` to do a `text classification`.
The text can be Chinese or English, but the model is `monolingual`. It means that you must know the language of the text input, and choose the right model to predict.
The monolingual models I used:
* for Chinese text: `hfl/roberta-wwm-ext-large`
* for English text: `ernie-2.0-large-en`
Of course, you can use the `multilingual` model to train, maybe the accuracy will not better than the monolingual model, I guess.
## â Environment
* It used `1 * NVIDIA Tesla V100 32G` to train model(Recommended). Ensure that CUDA is installed.
* Of course you can use CPU to train model
## ð Requirements
* Python 3.9
* paddlepaddle 2.1.3
* If you need a `CPU only` version, please install the this version
* Else if you need a `GPU` version, please install the right version that based on your GPU and CUDA.
For example: paddlepaddle-gpu==2.1.3.post101
* paddlenlp 2.1.0
If you want to deploy models, you also need to install:
* fastapi 0.79
* uvicorn 0.18.2
## ð Files
* There are two folders:
* `1-train`: Only to train a model that can predict text
* `2-deploy`: Only to deploy the trained model on VPS, and set up API
* It used `Jupyter Notebook` to easily start. Of course you can convert `.ipynb` to `.py`
* The step is the index of the files
* For example, you will run the file started with `1-xxx.ipynb`, and then run the file started with `2-xxx.ipynb`
* The files in `1-train/checkpoint` and `2-deploy/models` folder are fake files! You must get these files after running ipynb
## ð Data
* There are only some example text in data folder
* You need to convert your data into a `csv` file, which split by `\t`
* example data:
| text_a | label |
| ----------------------------------------------------- | ----- |
| Do you ever get a little bit tired of life | A |
| Like you're not really happy but you don't wanna die | B |
| ... | ... |
| Like you're hangin' by a thread but you gotta survive | B |
| 'Cause you gotta survive | C |
* You have to confirm that there is no `\t` in text and labels. `Important!!!`
* You need to split the data into train(80%) and test(20%), of course the rate you can set by yourself
## ð¯ To Run
Maybe there are something you have to change. For example: path
* Step 1: run `train.ipynb`. After running, you can get the model in `checkpoint` folder
* If your text language is Chinese, please run `1.1-train_Chinese.ipynb`
* Else if your text language is English, please run `1.2-train_English.ipynb`
* Step 2(Optional): run `2-evaluate.ipynb`. After running, you can get the classification reports
* Step 3(Optional): run `3-predict.ipynb`. After running, you can get the prediction from your file
* Step 4(Optional): run `4-predict_only_one.ipynb`. After running, you can predict only one text
* Step 5: run `5-to_static.ipynb`. After running, you can get the infer model that can be deployed
* Step 6(Optional): run `6-infer.ipynb`. A test to infer
## ð¢ Deploy
After getting the infer model, you can deploy it by using FastAPI or other API framework.
Please copy your files into `2-deploy/models/English` or `2-deploy/models/Chinese` and make them in your model folder like:
* label_map.json
* model.pdiparams
* model.pdiparams.info
* model.pdmodel
* tokenizer_config.json
* vocab.txt
Run: `python main.py`
Visit: `localhost:1234/docs` to read docs
## ð¡ Others
Docs about PaddlePaddle, PaddleNLP and FastAPI
* [PaddlePaddle](https://www.paddlepaddle.org.cn/en)
* [PaddleNLP](https://paddlenlp.readthedocs.io/en/latest/)
* [FastAPI](https://fastapi.tiangolo.com/)
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
毕设&课程作业_基于Paddle进行文本分类并部署上线 .zip (40个子文件)
Graduation Design
1-train
checkpoint
model_config.json 0B
opt 0B
label_map.json 0B
lr 0B
tokenizer_config.json 0B
infer_model
model.pdmodel 0B
model.pdiparams.info 0B
model.pdiparams 0B
model_state.pdparams 0B
vocab.txt 0B
4-predict_only_one.ipynb 8KB
1.2-train_English.ipynb 22KB
data
Chinese
train.csv 420B
test.csv 125B
English
train.csv 515B
test.csv 204B
1.1-train_Chinese.ipynb 21KB
2-evaluate.ipynb 18KB
6-infer.ipynb 9KB
5-to_static.ipynb 6KB
3-predict.ipynb 14KB
2-deploy
main.py 454B
routers
english.py 3KB
chinese.py 3KB
models
Text.py 86B
Chinese
model.pdmodel 0B
model.pdiparams.info 0B
label_map.json 0B
tokenizer_config.json 0B
vocab.txt 0B
model.pdiparams 0B
English
model.pdmodel 0B
model.pdiparams.info 0B
label_map.json 0B
tokenizer_config.json 0B
vocab.txt 0B
model.pdiparams 0B
imgs
example.gif 2.1MB
README.md 4KB
README-zh.md 4KB
共 40 条
- 1
资源评论
学术菜鸟小晨
- 粉丝: 1w+
- 资源: 4953
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功