基于PaddleOCR文字识别实现视频中关键字替换或遮盖项目源码+使用说明.zip资源-CSDN文库

共1502个文件

py：356个

txt：242个

md：216个

版权申诉

毕业设计

课程设计

python

深度学习

文字识别

110 浏览量 2023-09-18 09:19:23 上传评论收藏 125.48MB ZIP 举报

【资源说明】基于PaddleOCR文字识别实现视频中关键字替换或遮盖项目源码+使用说明.zip ```commandline python3 tools/infer_keyword/infer_end_to_end.py \ --keyword="aws" \ --image_dir=/home/jackdance/Desktop/aws_video/some_frame \ --det_model_dir="./pretrained_model/en_PP-OCRv3_det_infer/" \ --rec_model_dir="./pretrained_model/en_PP-OCRv3_rec_infer/" \ --rec_char_dict_path="ppocr/utils/en_dict.txt" \ --use_mp=True \ --total_process_num=8 ``` Parameter comment： - `keyword`: Keywords that need to be replaced or blocked (only English keywords can be specified here, if Chinese is specified, it is necessary to download the Chinese text detection and recognition model and modify the character set path for text recognition) - `image_dir`: input image folder - `video`: input video - `det_model_dir`: the path to text detection model - `rec_model_dir`: the path to text recognition model - `rec_char_dict_path`: the path to the text recognition character set, `ppocr/utils/en_dict.txt` is just for English, other language character set can be found in `ppocr/utils`. - `use_mp`: whether to enable multiprocessing - `total_process_num`: numbers of processes when using multiprocessing 3.2 Perform end-to-end inference of video The input is a single video and the output is a processed single video PS: [input video sample](https://pan.baidu.com/s/16AxRp0IVYF7AJ67L2GoZBA) Extraction code: f93p ```commandline python3 tools/infer_keyword/infer_end_to_end.py \ --keyword="aws" \ --video=/home/jackdance/Desktop/aws_video/aws_first_2mins.mp4 \ --det_model_dir="./pretrained_model/en_PP-OCRv3_det_infer/" \ --rec_model_dir="./pretrained_model/en_PP-OCRv3_rec_infer/" \ --rec_char_dict_path="ppocr/utils/en_dict.txt" \ --use_mp=True \ --total_process_num=8 ``` 【备注】 1、该资源内项目代码都经过测试运行成功，功能ok的情况下才上传的，请放心下载使用！ 2、本项目适合计算机相关专业(如计科、人工智能、通信工程、自动化、电子信息等)的在校学生、老师或者企业员工下载使用，也适合小白学习进阶，当然也可作为毕设项目、课程设计、作业、项目初期立项演示等。 3、如果基础还行，也可在此代码基础上进行修改，以实现其他功能，也可直接用于毕设、课设、作业等。欢迎下载，沟通交流，互相学习，共同进步！

资源推荐

资源详情

资源评论

收起资源包目录

基于PaddleOCR文字识别实现视频中关键字替换或遮盖项目源码+使用说明.zip （1502个子文件）

gradlew.bat 2KB

demo_bare_metal.c 2KB

ocr_db_crnn.cc 23KB

db_post_process.cc 11KB

custom_relu_op.cc 4KB

crnn_process.cc 4KB

cls_process.cc 1KB

setup.cfg 97B

arm-none-eabi-gcc.cmake 3KB

auto-log.cmake 392B

clipper.cpp 135KB

ocr_clipper.cpp 135KB

postprocess_op.cpp 14KB

general_detection_op.cpp 13KB

ocr_ppredictor.cpp 12KB

paddlestructure.cpp 10KB

ocr_db_post_process.cpp 10KB

utility.cpp 8KB

paddleocr.cpp 8KB

ocr_rec.cpp 8KB

ocr_det.cpp 7KB

structure_table.cpp 7KB

main.cpp 6KB

ocr_cls.cpp 5KB

preprocess_op.cpp 5KB

ocr_crnn_process.cpp 5KB

native.cpp 4KB

ppredictor.cpp 3KB

args.cpp 3KB

preprocess.cpp 3KB

ocr_cls_process.cpp 1KB

predictor_input.cpp 750B

predictor_output.cpp 617B

custom_relu_op.cu 3KB

Dockerfile 2KB

Dockerfile 245B

kie.gif 5.65MB

steps_en.gif 4.79MB

ppstructure.GIF 2.49MB

table.gif 1.86MB

multi-point.gif 818KB

paddlejs_demo.gif 554KB

.gitignore 90B

.gitignore 55B

.gitignore 7B

.gitkeep 0B

build.gradle 3KB

build.gradle 558B

settings.gradle 15B

gradlew 5KB

clipper.h 15KB

native.h 5KB

ocr_det.h 3KB

ocr_rec.h 3KB

postprocess_op.h 3KB

ocr_ppredictor.h 3KB

structure_table.h 3KB

utility.h 3KB

ocr_cls.h 3KB

paddlestructure.h 2KB

preprocess_op.h 2KB

paddleocr.h 2KB

db_post_process.h 2KB

args.h 2KB

tvm_runtime.h 2KB

ppredictor.h 1KB

crnn_process.h 1KB

common.h 1KB

crt_config.h 1002B

predictor_output.h 926B

cls_process.h 905B

ocr_cls_process.h 798B

predictor_input.h 589B

ocr_crnn_process.h 527B

ocr_db_post_process.h 403B

preprocess.h 371B

ocr_clipper.hpp 15KB

index.html 369B

app.icns 8B

MANIFEST.in 294B

gradle-wrapper.jar 53KB

MainActivity.java 20KB

Predictor.java 9KB

SettingsActivity.java 9KB

Utils.java 5KB

AppCompatPreferenceActivity.java 4KB

OCRPredictorNative.java 3KB

OcrResultModel.java 2KB

ExampleInstrumentedTest.java 740B

ExampleUnitTest.java 391B

TechnologyRoadmap.jpeg 4.06MB

1bbe854b8817dedb8585e0732089fd1f752d2cec.jpeg 181KB

2769.jpeg 175KB

architecture.jpeg 122KB

ArT.jpg 3.12MB

zh_val_42.jpg 1.78MB

zh_val_42_re.jpg 1.6MB

zh_val_42_re.jpg 1.57MB

zh_val_42_ser.jpg 1.55MB

共 1502 条

English | [简体中文](README_ch.md) # Layout analysis - [1. Introduction](#1-Introduction) - [2. Quick start](#2-Quick-start) - [3. Install](#3-Install) - [3.1 Install PaddlePaddle](#31-Install-paddlepaddle) - [3.2 Install PaddleDetection](#32-Install-paddledetection) - [4. Data preparation](#4-Data-preparation) - [4.1 English data set](#41-English-data-set) - [4.2 More datasets](#42-More-datasets) - [5. Start training](#5-Start-training) - [5.1 Train](#51-Train) - [5.2 FGD Distillation training](#52-Fgd-distillation-training) - [6. Model evaluation and prediction](#6-Model-evaluation-and-prediction) - [6.1 Indicator evaluation](#61-Indicator-evaluation) - [6.2 Test layout analysis results](#62-Test-layout-analysis-results) - [7. Model export and inference](#7-Model-export-and-inference) - [7.1 Model export](#71-Model-export) - [7.2 Model inference](#72-Model-inference) ## 1. Introduction Layout analysis refers to the regional division of documents in the form of pictures and the positioning of key areas, such as text, title, table, picture, etc. The layout analysis algorithm is based on the lightweight model PP-picodet of [PaddleDetection]( https://github.com/PaddlePaddle/PaddleDetection ) <div align="center"> <img src="../docs/layout/layout.png" width="800"> </div> ## 2. Quick start PP-Structure currently provides layout analysis models in Chinese, English and table documents. For the model link, see [models_list](../docs/models_list_en.md). The whl package is also provided for quick use, see [quickstart](../docs/quickstart_en.md) for details. ## 3. Install ### 3.1. Install PaddlePaddle - **（1) Install PaddlePaddle** ```bash python3 -m pip install --upgrade pip # GPU Install python3 -m pip install "paddlepaddle-gpu>=2.3" -i https://mirror.baidu.com/pypi/simple # CPU Install python3 -m pip install "paddlepaddle>=2.3" -i https://mirror.baidu.com/pypi/simple ``` For more requirements, please refer to the instructions in the [Install file](https://www.paddlepaddle.org.cn/install/quick)。 ### 3.2. Install PaddleDetection - **（1）Download PaddleDetection Source code** ```bash git clone https://github.com/PaddlePaddle/PaddleDetection.git ``` - **（2）Install third-party libraries** ```bash cd PaddleDetection python3 -m pip install -r requirements.txt ``` ## 4. Data preparation If you want to experience the prediction process directly, you can skip data preparation and download the pre-training model. ### 4.1. English data set Download document analysis data set [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/)（Dataset 96G），contains 5 classes：`{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}` ``` # Download data wget https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/publaynet.tar.gz # Decompress data tar -xvf publaynet.tar.gz ``` Uncompressed **directory structure：** ``` |-publaynet |- test |- PMC1277013_00004.jpg |- PMC1291385_00002.jpg | ... |- train.json |- train |- PMC1291385_00002.jpg |- PMC1277013_00004.jpg | ... |- val.json |- val |- PMC538274_00004.jpg |- PMC539300_00004.jpg | ... ``` **data distribution：** | File or Folder | Description | num | | :------------- | :------------- | ------- | | `train/` | Training set pictures | 335,703 | | `val/` | Verification set pictures | 11,245 | | `test/` | Test set pictures | 11,405 | | `train.json` | Training set annotation files | - | | `val.json` | Validation set dimension files | - | **Data Annotation** The JSON file contains the annotations of all images, and the data is stored in a dictionary nested manner.Contains the following keys： - info，represents the dimension file info。 - licenses，represents the dimension file licenses。 - images，represents the list of image information in the annotation file，each element is the information of an image。The information of one of the images is as follows: ``` { 'file_name': 'PMC4055390_00006.jpg', # file_name 'height': 601, # image height 'width': 792, # image width 'id': 341427 # image id } ``` - annotations， represents the list of annotation information of the target object in the annotation file，each element is the annotation information of a target object。The following is the annotation information of one of the target objects: ``` { 'segmentation': # Segmentation annotation of objects 'area': 60518.099043117836, # Area of object 'iscrowd': 0, # iscrowd 'image_id': 341427, # image id 'bbox': [50.58, 490.86, 240.15, 252.16], # bbox [x1,y1,w,h] 'category_id': 1, # category_id 'id': 3322348 # image id } ``` ### 4.2. More datasets We provide CDLA(Chinese layout analysis), TableBank(Table layout analysis)etc. data set download links，process to the JSON format of the above annotation file，that is, the training can be conducted in the same way。 | dataset | 简介 | | ------------------------------------------------------------ | ------------------------------------------------------------ | | [cTDaR2019_cTDaR](https://cndplab-founder.github.io/cTDaR2019/) | For form detection (TRACKA) and form identification (TRACKB).Image types include historical data sets (beginning with cTDaR_t0, such as CTDAR_T00872.jpg) and modern data sets (beginning with cTDaR_t1, CTDAR_T10482.jpg). | | [IIIT-AR-13K](http://cvit.iiit.ac.in/usodi/iiitar13k.php) | Data sets constructed by manually annotating figures or pages from publicly available annual reports, containing 5 categories:table, figure, natural image, logo, and signature. | | [TableBank](https://github.com/doc-analysis/TableBank) | For table detection and recognition of large datasets, including Word and Latex document formats | | [CDLA](https://github.com/buptlihang/CDLA) | Chinese document layout analysis data set, for Chinese literature (paper) scenarios, including 10 categories:Table, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation | | [DocBank](https://github.com/doc-analysis/DocBank) | Large-scale dataset (500K document pages) constructed using weakly supervised methods for document layout analysis, containing 12 categories:Author, Caption, Date, Equation, Figure, Footer, List, Paragraph, Reference, Section, Table, Title | ## 5. Start training Training scripts, evaluation scripts, and prediction scripts are provided, and the PubLayNet pre-training model is used as an example in this section. If you do not want training and directly experience the following process of model evaluation, prediction, motion to static, and inference, you can download the provided pre-trained model (PubLayNet dataset) and skip this part. ``` mkdir pretrained_model cd pretrained_model # Download PubLayNet pre-training model（Direct experience model evaluates, predicts, and turns static） wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams # Download the PubLaynet inference model（Direct experience model reasoning） wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar ``` If the test image is Chinese, the pre-trained model of Chinese CDLA dataset can be downloaded to identify 10 types of document regions：Table, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation，Download the training model and inference model of Model 'picodet_lcnet_x1_0_fgd_layout_cdla' in [layout analysis model](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md)�

评论收藏

内容反馈

版权申诉