English | [简体中文](README_ch.md)
# Layout analysis
- [1. Introduction](#1-Introduction)
- [2. Install](#2-Install)
- [2.1 Install PaddlePaddle](#21-Install-paddlepaddle)
- [2.2 Install PaddleDetection](#22-Install-paddledetection)
- [3. Data preparation](#3-Data-preparation)
- [3.1 English data set](#31-English-data-set)
- [3.2 More datasets](#32-More-datasets)
- [4. Start training](#4-Start-training)
- [4.1 Train](#41-Train)
- [4.2 FGD Distillation training](#42-FGD-Distillation-training)
- [5. Model evaluation and prediction](#5-Model-evaluation-and-prediction)
- [5.1 Indicator evaluation](#51-Indicator-evaluation)
- [5.2 Test layout analysis results](#52-Test-layout-analysis-results)
- [6 Model export and inference](#6-Model-export-and-inference)
- [6.1 Model export](#61-Model-export)
- [6.2 Model inference](#62-Model-inference)
## 1. Introduction
Layout analysis refers to the regional division of documents in the form of pictures and the positioning of key areas, such as text, title, table, picture, etc. The layout analysis algorithm is based on the lightweight model PP-picodet of [PaddleDetection]( https://github.com/PaddlePaddle/PaddleDetection )
<div align="center">
<img src="../docs/layout/layout.png" width="800">
</div>
## 2. Install
### 2.1. Install PaddlePaddle
- **(1) Install PaddlePaddle**
```bash
python3 -m pip install --upgrade pip
# GPU Install
python3 -m pip install "paddlepaddle-gpu>=2.3" -i https://mirror.baidu.com/pypi/simple
# CPU Install
python3 -m pip install "paddlepaddle>=2.3" -i https://mirror.baidu.com/pypi/simple
```
For more requirements, please refer to the instructions in the [Install file](https://www.paddlepaddle.org.cn/install/quick)。
### 2.2. Install PaddleDetection
- **(1)Download PaddleDetection Source code**
```bash
git clone https://github.com/PaddlePaddle/PaddleDetection.git
```
- **(2)Install third-party libraries**
```bash
cd PaddleDetection
python3 -m pip install -r requirements.txt
```
## 3. Data preparation
If you want to experience the prediction process directly, you can skip data preparation and download the pre-training model.
### 3.1. English data set
Download document analysis data set [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/)(Dataset 96G),contains 5 classes:`{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}`
```
# Download data
wget https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/publaynet.tar.gz
# Decompress data
tar -xvf publaynet.tar.gz
```
Uncompressed **directory structure:**
```
|-publaynet
|- test
|- PMC1277013_00004.jpg
|- PMC1291385_00002.jpg
| ...
|- train.json
|- train
|- PMC1291385_00002.jpg
|- PMC1277013_00004.jpg
| ...
|- val.json
|- val
|- PMC538274_00004.jpg
|- PMC539300_00004.jpg
| ...
```
**data distribution:**
| File or Folder | Description | num |
| :------------- | :------------- | ------- |
| `train/` | Training set pictures | 335,703 |
| `val/` | Verification set pictures | 11,245 |
| `test/` | Test set pictures | 11,405 |
| `train.json` | Training set annotation files | - |
| `val.json` | Validation set dimension files | - |
**Data Annotation**
The JSON file contains the annotations of all images, and the data is stored in a dictionary nested manner.Contains the following keys:
- info,represents the dimension file info。
- licenses,represents the dimension file licenses。
- images,represents the list of image information in the annotation file,each element is the information of an image。The information of one of the images is as follows:
```
{
'file_name': 'PMC4055390_00006.jpg', # file_name
'height': 601, # image height
'width': 792, # image width
'id': 341427 # image id
}
```
- annotations, represents the list of annotation information of the target object in the annotation file,each element is the annotation information of a target object。The following is the annotation information of one of the target objects:
```
{
'segmentation': # Segmentation annotation of objects
'area': 60518.099043117836, # Area of object
'iscrowd': 0, # iscrowd
'image_id': 341427, # image id
'bbox': [50.58, 490.86, 240.15, 252.16], # bbox [x1,y1,w,h]
'category_id': 1, # category_id
'id': 3322348 # image id
}
```
### 3.2. More datasets
We provide CDLA(Chinese layout analysis), TableBank(Table layout analysis)etc. data set download links,process to the JSON format of the above annotation file,that is, the training can be conducted in the same way。
| dataset | 简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [cTDaR2019_cTDaR](https://cndplab-founder.github.io/cTDaR2019/) | For form detection (TRACKA) and form identification (TRACKB).Image types include historical data sets (beginning with cTDaR_t0, such as CTDAR_T00872.jpg) and modern data sets (beginning with cTDaR_t1, CTDAR_T10482.jpg). |
| [IIIT-AR-13K](http://cvit.iiit.ac.in/usodi/iiitar13k.php) | Data sets constructed by manually annotating figures or pages from publicly available annual reports, containing 5 categories:table, figure, natural image, logo, and signature. |
| [TableBank](https://github.com/doc-analysis/TableBank) | For table detection and recognition of large datasets, including Word and Latex document formats |
| [CDLA](https://github.com/buptlihang/CDLA) | Chinese document layout analysis data set, for Chinese literature (paper) scenarios, including 10 categories:Table, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation |
| [DocBank](https://github.com/doc-analysis/DocBank) | Large-scale dataset (500K document pages) constructed using weakly supervised methods for document layout analysis, containing 12 categories:Author, Caption, Date, Equation, Figure, Footer, List, Paragraph, Reference, Section, Table, Title |
## 4. Start training
Training scripts, evaluation scripts, and prediction scripts are provided, and the PubLayNet pre-training model is used as an example in this section.
If you do not want training and directly experience the following process of model evaluation, prediction, motion to static, and inference, you can download the provided pre-trained model (PubLayNet dataset) and skip this part.
```
mkdir pretrained_model
cd pretrained_model
# Download PubLayNet pre-training model(Direct experience model evaluates, predicts, and turns static)
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams
# Download the PubLaynet inference model(Direct experience model reasoning)
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar
```
If the test image is Chinese, the pre-trained model of Chinese CDLA dataset can be downloaded to identify 10 types of document regions:Table, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation,Download the training model and inference model of Model 'picodet_lcnet_x1_0_fgd_layout_cdla' in [layout analysis model](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md)。If only the table area in the image is detected, you can download the pre-trained model of the table dataset, and download the training model and inference model of the 'picodet_LCnet_x1_0_FGd_layout_table' model in [Layout Analysis model](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models
没有合适的资源?快使用搜索试试~ 我知道了~
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地.rar
共1521个文件
py:351个
txt:244个
md:229个
需积分: 5 4 下载量 78 浏览量
2023-07-05
16:55:26
上传
评论
收藏 118.93MB RAR 举报
温馨提示
PaddleOCR特性: 超轻量级中文OCR模型,总模型仅8.6M 单模型支持中英文数字组合识别、竖排文本识别、长文本识别 检测模型DB(4.1M)+识别模型CRNN(4.5M) 实用通用中文OCR模型 多种预测推理部署方案,包括服务部署和端侧部署 多种文本检测训练算法,EAST、DB、SAST 多种文本识别训练算法,Rosetta、CRNN、STAR-Net、RARE、SRN 可运行于Linux、Windows、MacOS等多种系统 一、前期准备 1. 依赖库安装 我用的是在 Pycharm 内自动安装,版本默认了当前最高,方便高效。 PaddlePaddle:基础环境,PaddleOCR需在PaddlePaddle下才可以正常运行; PaddleOCR:图片识别的插件;
资源推荐
资源详情
资源评论
收起资源包目录
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地.rar (1521个子文件)
gradlew.bat 2KB
demo_bare_metal.c 2KB
ocr_db_crnn.cc 23KB
db_post_process.cc 11KB
custom_relu_op.cc 4KB
crnn_process.cc 4KB
cls_process.cc 1KB
setup.cfg 97B
arm-none-eabi-gcc.cmake 3KB
auto-log.cmake 392B
clipper.cpp 135KB
ocr_clipper.cpp 135KB
postprocess_op.cpp 14KB
general_detection_op.cpp 13KB
ocr_ppredictor.cpp 12KB
paddlestructure.cpp 10KB
ocr_db_post_process.cpp 10KB
utility.cpp 8KB
paddleocr.cpp 8KB
ocr_rec.cpp 8KB
ocr_det.cpp 7KB
structure_table.cpp 7KB
main.cpp 6KB
ocr_cls.cpp 5KB
preprocess_op.cpp 5KB
ocr_crnn_process.cpp 5KB
native.cpp 4KB
ppredictor.cpp 3KB
args.cpp 3KB
preprocess.cpp 3KB
ocr_cls_process.cpp 1KB
predictor_input.cpp 750B
predictor_output.cpp 617B
custom_relu_op.cu 3KB
Dockerfile 2KB
Dockerfile 2KB
kie.gif 5.65MB
steps_en.gif 4.79MB
ppstructure.GIF 2.49MB
table.gif 1.86MB
multi-point.gif 818KB
paddlejs_demo.gif 554KB
.gitignore 469B
.gitignore 90B
.gitignore 55B
.gitignore 7B
.gitkeep 0B
build.gradle 3KB
build.gradle 558B
settings.gradle 15B
gradlew 5KB
clipper.h 15KB
native.h 5KB
ocr_det.h 3KB
ocr_rec.h 3KB
postprocess_op.h 3KB
ocr_ppredictor.h 3KB
structure_table.h 3KB
utility.h 3KB
ocr_cls.h 3KB
paddlestructure.h 2KB
preprocess_op.h 2KB
paddleocr.h 2KB
db_post_process.h 2KB
args.h 2KB
tvm_runtime.h 2KB
ppredictor.h 1KB
crnn_process.h 1KB
common.h 1KB
crt_config.h 1002B
predictor_output.h 926B
cls_process.h 905B
ocr_cls_process.h 798B
predictor_input.h 589B
ocr_crnn_process.h 527B
ocr_db_post_process.h 403B
preprocess.h 371B
.clang_format.hook 353B
ocr_clipper.hpp 15KB
说明.htm 4KB
index.html 369B
app.icns 8B
MANIFEST.in 294B
gradle-wrapper.jar 53KB
MainActivity.java 20KB
Predictor.java 9KB
SettingsActivity.java 9KB
Utils.java 5KB
AppCompatPreferenceActivity.java 4KB
OCRPredictorNative.java 3KB
OcrResultModel.java 2KB
ExampleInstrumentedTest.java 740B
ExampleUnitTest.java 391B
1bbe854b8817dedb8585e0732089fd1f752d2cec.jpeg 181KB
2769.jpeg 175KB
architecture.jpeg 122KB
ArT.jpg 3.12MB
zh_val_42.jpg 1.78MB
zh_val_42_re.jpg 1.6MB
zh_val_42_re.jpg 1.57MB
共 1521 条
- 1
- 2
- 3
- 4
- 5
- 6
- 16
资源评论
野生的大熊
- 粉丝: 234
- 资源: 246
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功