毕设&课程作业_基于Paddle进行文本分类并部署上线.zip

共40个文件

ipynb：7个

json：7个

py：4个

版权申诉

毕业设计

fastapi

106 浏览量 2024-01-24 09:47:47 上传评论收藏 2.13MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

毕设&课程作业_基于Paddle进行文本分类并部署上线 .zip （40个子文件）

Graduation Design

1-train

checkpoint

model_config.json 0B

opt 0B

label_map.json 0B

lr 0B

tokenizer_config.json 0B

infer_model

model.pdmodel 0B

model.pdiparams.info 0B

model.pdiparams 0B

model_state.pdparams 0B

vocab.txt 0B

4-predict_only_one.ipynb 8KB

1.2-train_English.ipynb 22KB

data

Chinese

train.csv 420B

test.csv 125B

English

train.csv 515B

test.csv 204B

1.1-train_Chinese.ipynb 21KB

2-evaluate.ipynb 18KB

6-infer.ipynb 9KB

5-to_static.ipynb 6KB

3-predict.ipynb 14KB

2-deploy

main.py 454B

routers

english.py 3KB

chinese.py 3KB

models

Text.py 86B

Chinese

model.pdmodel 0B

model.pdiparams.info 0B

label_map.json 0B

tokenizer_config.json 0B

vocab.txt 0B

model.pdiparams 0B

English

model.pdmodel 0B

model.pdiparams.info 0B

label_map.json 0B

tokenizer_config.json 0B

vocab.txt 0B

model.pdiparams 0B

imgs

example.gif 2.1MB

README.md 4KB

README-zh.md 4KB

# Text-Classification-using-Paddle ![example](imgs/example.gif) ## ð¨ Language * [English](/README.md) * [ä¸æ](/README-zh.md) ## ð Description This code is used `Paddle` to do a `text classification`. The text can be Chinese or English, but the model is `monolingual`. It means that you must know the language of the text input, and choose the right model to predict. The monolingual models I used: * for Chinese text: `hfl/roberta-wwm-ext-large` * for English text: `ernie-2.0-large-en` Of course, you can use the `multilingual` model to train, maybe the accuracy will not better than the monolingual model, I guess. ## â Environment * It used `1 * NVIDIA Tesla V100 32G` to train model(Recommended). Ensure that CUDA is installed. * Of course you can use CPU to train model ## ð Requirements * Python 3.9 * paddlepaddle 2.1.3 * If you need a `CPU only` version, please install the this version * Else if you need a `GPU` version, please install the right version that based on your GPU and CUDA. For example: paddlepaddle-gpu==2.1.3.post101 * paddlenlp 2.1.0 If you want to deploy models, you also need to install: * fastapi 0.79 * uvicorn 0.18.2 ## ð Files * There are two folders: * `1-train`: Only to train a model that can predict text * `2-deploy`: Only to deploy the trained model on VPS, and set up API * It used `Jupyter Notebook` to easily start. Of course you can convert `.ipynb` to `.py` * The step is the index of the files * For example, you will run the file started with `1-xxx.ipynb`, and then run the file started with `2-xxx.ipynb` * The files in `1-train/checkpoint` and `2-deploy/models` folder are fake files! You must get these files after running ipynb ## ð Data * There are only some example text in data folder * You need to convert your data into a `csv` file, which split by `\t` * example data: | text_a | label | | ----------------------------------------------------- | ----- | | Do you ever get a little bit tired of life | A | | Like you're not really happy but you don't wanna die | B | | ... | ... | | Like you're hangin' by a thread but you gotta survive | B | | 'Cause you gotta survive | C | * You have to confirm that there is no `\t` in text and labels. `Important!!!` * You need to split the data into train(80%) and test(20%), of course the rate you can set by yourself ## ð¯ To Run Maybe there are something you have to change. For example: path * Step 1: run `train.ipynb`. After running, you can get the model in `checkpoint` folder * If your text language is Chinese, please run `1.1-train_Chinese.ipynb` * Else if your text language is English, please run `1.2-train_English.ipynb` * Step 2(Optional): run `2-evaluate.ipynb`. After running, you can get the classification reports * Step 3(Optional): run `3-predict.ipynb`. After running, you can get the prediction from your file * Step 4(Optional): run `4-predict_only_one.ipynb`. After running, you can predict only one text * Step 5: run `5-to_static.ipynb`. After running, you can get the infer model that can be deployed * Step 6(Optional): run `6-infer.ipynb`. A test to infer ## ð¢ Deploy After getting the infer model, you can deploy it by using FastAPI or other API framework. Please copy your files into `2-deploy/models/English` or `2-deploy/models/Chinese` and make them in your model folder like: * label_map.json * model.pdiparams * model.pdiparams.info * model.pdmodel * tokenizer_config.json * vocab.txt Run: `python main.py` Visit: `localhost:1234/docs` to read docs ## ð¡ Others Docs about PaddlePaddle, PaddleNLP and FastAPI * [PaddlePaddle](https://www.paddlepaddle.org.cn/en) * [PaddleNLP](https://paddlenlp.readthedocs.io/en/latest/) * [FastAPI](https://fastapi.tiangolo.com/)

评论收藏

内容反馈

版权申诉