OCR工具库，包含总模型仅8.6M的超轻量级中文OCR，单模型支持中英文数字组合识别、竖排文本识别、长文本识别同时支持多种文本资源-CSDN文库

共2000个文件

md：499个

py：431个

txt：290个

版权申诉

OCR

中文识别

20 浏览量 2024-09-07 11:48:54 上传评论收藏 195.47MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

OCR工具库，包含总模型仅8.6M的超轻量级中文OCR，单模型支持中英文数字组合识别、竖排文本识别、长文本识别同时支持多种文本（2000个子文件）

infer.c 9KB

demo_bare_metal.c 2KB

custom_relu_op.cc 4KB

clipper.cpp 135KB

ocr_clipper.cpp 135KB

postprocess_op.cpp 19KB

utility.cpp 13KB

general_detection_op.cpp 13KB

ocr_ppredictor.cpp 12KB

paddlestructure.cpp 10KB

ocr_db_post_process.cpp 10KB

main.cpp 7KB

ocr_rec.cpp 7KB

structure_table.cpp 7KB

paddleocr.cpp 7KB

ocr_cls.cpp 6KB

structure_layout.cpp 6KB

ocr_det.cpp 5KB

preprocess_op.cpp 5KB

ocr_crnn_process.cpp 5KB

native.cpp 4KB

args.cpp 4KB

ppredictor.cpp 3KB

preprocess.cpp 3KB

ocr_cls_process.cpp 1KB

predictor_input.cpp 775B

predictor_output.cpp 642B

custom_relu_op.cu 3KB

20210816_210413.gif 413KB

.gitattributes 65B

.gitignore 133B

.gitkeep 0B

clipper.h 14KB

native.h 5KB

postprocess_op.h 4KB

utility.h 3KB

ocr_det.h 3KB

ocr_ppredictor.h 3KB

ocr_rec.h 3KB

structure_table.h 3KB

structure_layout.h 2KB

ocr_cls.h 2KB

preprocess_op.h 2KB

paddlestructure.h 2KB

args.h 2KB

db_post_process.h 2KB

paddleocr.h 2KB

tvm_runtime.h 2KB

ppredictor.h 1KB

crnn_process.h 1KB

common.h 1KB

crt_config.h 1001B

predictor_output.h 950B

cls_process.h 906B

ocr_cls_process.h 799B

predictor_input.h 613B

ocr_crnn_process.h 528B

ocr_db_post_process.h 404B

preprocess.h 371B

.clang_format.hook 526B

ocr_clipper.hpp 14KB

comments.html 2KB

index.html 370B

OcrMainActivity.java 20KB

MainActivity.java 20KB

CameraSurfaceView.java 15KB

Utils.java 11KB

OcrSettingsActivity.java 10KB

Predictor.java 9KB

SettingsActivity.java 9KB

Utils.java 5KB

AppCompatPreferenceActivity.java 4KB

OCRPredictorNative.java 3KB

AppCompatPreferenceActivity.java 3KB

BaseResultAdapter.java 2KB

OcrResultModel.java 2KB

ResultListView.java 1KB

ActionBarLayout.java 996B

BaseResultModel.java 800B

ExampleInstrumentedTest.java 769B

ExampleInstrumentedTest.java 740B

ExampleUnitTest.java 392B

a3b25766f3074d2facdf88d4a60fc76612f51992fd124cf5bd846b213130665b-0097611.jpeg 533KB

a3b25766f3074d2facdf88d4a60fc76612f51992fd124cf5bd846b213130665b.jpeg 533KB

4de19ca3e54343e88961e816cad28bbacdc807f40b9440be914d871b0a914570.jpeg 497KB

46258d0dc9dc40bab3ea0e70434e4a905646df8a647f4c49921e217de5142def.jpeg 332KB

0639da09b774458096ae577e82b2c59e89ced6a00f55458f946997ab7472a4f8.jpeg 250KB

1bbe854b8817dedb8585e0732089fd1f752d2cec.jpeg 181KB

2769.jpeg 175KB

fe350481be0241c58736d487d1bf06c2e65911bf01254a79944be629c4c10091.jpeg 174KB

3d762970e2184177a2c633695a31029332a4cd805631430ea797309492e45402.jpeg 158KB

d9e0533cc1df47ffa3bbe99de9e42639a3ebfa5bce834bafb1ca4574bf9db684.jpeg 143KB

9bd844b970f94e5ba0bc0c5799bd819ea9b1861bb306471fabc2d628864d418e.jpeg 135KB

dd721099bd50478f9d5fb13d8dd00fad69c22d6848244fd3a1d3980d7fefc63e.jpeg 101KB

60b95b4945954f81a080a8f308cee66f83146479cd1142b9b6b1290938fd1df8.jpeg 98KB

45f288ce8b2c45d8aa5407785b4b40f4876fc3da23744bd7a78060797fba0190.jpeg 97KB

3dc7f69fac174cde96b9d08b5e2353a1d88dc63e7be9410894c0783660b35b76.jpeg 97KB

166ce56d634c4c7589fe68fbc6e7ae663305dcc82ba144c781507341ffae7fe8.jpeg 89KB

07c3b060c54e4b00be7de8d41a8a4696ff53835343cc4981aab0555183306e79.jpeg 88KB

0d582de9aa46474791e08654f84a614a6510e98bfe5f4ad3a26501cbf49ec151.jpeg 87KB

共 2000 条

English | [简体中文](README_ch.md) # Layout analysis - [1. Introduction](#1-Introduction) - [2. Quick start](#2-Quick-start) - [3. Install](#3-Install) - [3.1 Install PaddlePaddle](#31-Install-paddlepaddle) - [3.2 Install PaddleDetection](#32-Install-paddledetection) - [4. Data preparation](#4-Data-preparation) - [4.1 English data set](#41-English-data-set) - [4.2 More datasets](#42-More-datasets) - [5. Start training](#5-Start-training) - [5.1 Train](#51-Train) - [5.2 FGD Distillation training](#52-Fgd-distillation-training) - [6. Model evaluation and prediction](#6-Model-evaluation-and-prediction) - [6.1 Indicator evaluation](#61-Indicator-evaluation) - [6.2 Test layout analysis results](#62-Test-layout-analysis-results) - [7. Model export and inference](#7-Model-export-and-inference) - [7.1 Model export](#71-Model-export) - [7.2 Model inference](#72-Model-inference) ## 1. Introduction Layout analysis refers to the regional division of documents in the form of pictures and the positioning of key areas, such as text, title, table, picture, etc. The layout analysis algorithm is based on the lightweight model PP-picodet of [PaddleDetection]( https://github.com/PaddlePaddle/PaddleDetection ), including English layout analysis, Chinese layout analysis and table layout analysis models. English layout analysis models can detect document layout elements such as text, title, table, figure, list. Chinese layout analysis models can detect document layout elements such as text, figure, figure caption, table, table caption, header, footer, reference, and equation. Table layout analysis models can detect table regions. <div align="center"> <img src="../docs/layout/layout.png" width="800"> </div> ## 2. Quick start PP-Structure currently provides layout analysis models in Chinese, English and table documents. For the model link, see [models_list](../docs/models_list_en.md). The whl package is also provided for quick use, see [quickstart](../docs/quickstart_en.md) for details. ## 3. Install ### 3.1. Install PaddlePaddle - **（1) Install PaddlePaddle** ```bash python3 -m pip install --upgrade pip # GPU Install python3 -m pip install "paddlepaddle-gpu>=2.3" -i https://mirror.baidu.com/pypi/simple # CPU Install python3 -m pip install "paddlepaddle>=2.3" -i https://mirror.baidu.com/pypi/simple ``` For more requirements, please refer to the instructions in the [Install file](https://www.paddlepaddle.org.cn/install/quick)。 ### 3.2. Install PaddleDetection - **（1）Download PaddleDetection Source code** ```bash git clone https://github.com/PaddlePaddle/PaddleDetection.git ``` - **（2）Install third-party libraries** ```bash cd PaddleDetection python3 -m pip install -r requirements.txt ``` ## 4. Data preparation If you want to experience the prediction process directly, you can skip data preparation and download the pre-training model. ### 4.1. English data set Download document analysis data set [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/)（Dataset 96G），contains 5 classes：`{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}` ``` # Download data wget https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/publaynet.tar.gz # Decompress data tar -xvf publaynet.tar.gz ``` Uncompressed **directory structure：** ``` |-publaynet |- test |- PMC1277013_00004.jpg |- PMC1291385_00002.jpg | ... |- train.json |- train |- PMC1291385_00002.jpg |- PMC1277013_00004.jpg | ... |- val.json |- val |- PMC538274_00004.jpg |- PMC539300_00004.jpg | ... ``` **data distribution：** | File or Folder | Description | num | | :------------- | :------------- | ------- | | `train/` | Training set pictures | 335,703 | | `val/` | Verification set pictures | 11,245 | | `test/` | Test set pictures | 11,405 | | `train.json` | Training set annotation files | - | | `val.json` | Validation set dimension files | - | **Data Annotation** The JSON file contains the annotations of all images, and the data is stored in a dictionary nested manner.Contains the following keys： - info，represents the dimension file info。 - licenses，represents the dimension file licenses。 - images，represents the list of image information in the annotation file，each element is the information of an image。The information of one of the images is as follows: ``` { 'file_name': 'PMC4055390_00006.jpg', # file_name 'height': 601, # image height 'width': 792, # image width 'id': 341427 # image id } ``` - annotations， represents the list of annotation information of the target object in the annotation file，each element is the annotation information of a target object。The following is the annotation information of one of the target objects: ``` { 'segmentation': # Segmentation annotation of objects 'area': 60518.099043117836, # Area of object 'iscrowd': 0, # iscrowd 'image_id': 341427, # image id 'bbox': [50.58, 490.86, 240.15, 252.16], # bbox [x1,y1,w,h] 'category_id': 1, # category_id 'id': 3322348 # image id } ``` ### 4.2. More datasets We provide CDLA(Chinese layout analysis), TableBank(Table layout analysis)etc. data set download links，process to the JSON format of the above annotation file，that is, the training can be conducted in the same way。 | dataset | 简介 | | ------------------------------------------------------------ | ------------------------------------------------------------ | | [cTDaR2019_cTDaR](https://cndplab-founder.github.io/cTDaR2019/) | For form detection (TRACKA) and form identification (TRACKB).Image types include historical data sets (beginning with cTDaR_t0, such as CTDAR_T00872.jpg) and modern data sets (beginning with cTDaR_t1, CTDAR_T10482.jpg). | | [IIIT-AR-13K](http://cvit.iiit.ac.in/usodi/iiitar13k.php) | Data sets constructed by manually annotating figures or pages from publicly available annual reports, containing 5 categories:table, figure, natural image, logo, and signature. | | [TableBank](https://github.com/doc-analysis/TableBank) | For table detection and recognition of large datasets, including Word and Latex document formats | | [CDLA](https://github.com/buptlihang/CDLA) | Chinese document layout analysis data set, for Chinese literature (paper) scenarios, including 10 categories:Text, Title, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation | | [DocBank](https://github.com/doc-analysis/DocBank) | Large-scale dataset (500K document pages) constructed using weakly supervised methods for document layout analysis, containing 12 categories:Author, Caption, Date, Equation, Figure, Footer, List, Paragraph, Reference, Section, Table, Title | ## 5. Start training Training scripts, evaluation scripts, and prediction scripts are provided, and the PubLayNet pre-training model is used as an example in this section. If you do not want training and directly experience the following process of model evaluation, prediction, motion to static, and inference, you can download the provided pre-trained model (PubLayNet dataset) and skip this part. ``` mkdir pretrained_model cd pretrained_model # Download PubLayNet pre-training model（Direct experience model evaluates, predicts, and turns static） wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams # Download the PubLaynet inference model（Direct experience model reasoning） wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar ``` If th

评论收藏

内容反馈

版权申诉