English | [简体中文](README_ch.md)
# Layout analysis
- [1. Introduction](#1-Introduction)
- [2. Quick start](#2-Quick-start)
- [3. Install](#3-Install)
- [3.1 Install PaddlePaddle](#31-Install-paddlepaddle)
- [3.2 Install PaddleDetection](#32-Install-paddledetection)
- [4. Data preparation](#4-Data-preparation)
- [4.1 English data set](#41-English-data-set)
- [4.2 More datasets](#42-More-datasets)
- [5. Start training](#5-Start-training)
- [5.1 Train](#51-Train)
- [5.2 FGD Distillation training](#52-Fgd-distillation-training)
- [6. Model evaluation and prediction](#6-Model-evaluation-and-prediction)
- [6.1 Indicator evaluation](#61-Indicator-evaluation)
- [6.2 Test layout analysis results](#62-Test-layout-analysis-results)
- [7. Model export and inference](#7-Model-export-and-inference)
- [7.1 Model export](#71-Model-export)
- [7.2 Model inference](#72-Model-inference)
## 1. Introduction
Layout analysis refers to the regional division of documents in the form of pictures and the positioning of key areas, such as text, title, table, picture, etc. The layout analysis algorithm is based on the lightweight model PP-picodet of [PaddleDetection]( https://github.com/PaddlePaddle/PaddleDetection ), including English layout analysis, Chinese layout analysis and table layout analysis models. English layout analysis models can detect document layout elements such as text, title, table, figure, list. Chinese layout analysis models can detect document layout elements such as text, figure, figure caption, table, table caption, header, footer, reference, and equation. Table layout analysis models can detect table regions.
<div align="center">
<img src="../docs/layout/layout.png" width="800">
</div>
## 2. Quick start
PP-Structure currently provides layout analysis models in Chinese, English and table documents. For the model link, see [models_list](../docs/models_list_en.md). The whl package is also provided for quick use, see [quickstart](../docs/quickstart_en.md) for details.
## 3. Install
### 3.1. Install PaddlePaddle
- **(1) Install PaddlePaddle**
```bash
python3 -m pip install --upgrade pip
# GPU Install
python3 -m pip install "paddlepaddle-gpu>=2.3" -i https://mirror.baidu.com/pypi/simple
# CPU Install
python3 -m pip install "paddlepaddle>=2.3" -i https://mirror.baidu.com/pypi/simple
```
For more requirements, please refer to the instructions in the [Install file](https://www.paddlepaddle.org.cn/install/quick)。
### 3.2. Install PaddleDetection
- **(1)Download PaddleDetection Source code**
```bash
git clone https://github.com/PaddlePaddle/PaddleDetection.git
```
- **(2)Install third-party libraries**
```bash
cd PaddleDetection
python3 -m pip install -r requirements.txt
```
## 4. Data preparation
If you want to experience the prediction process directly, you can skip data preparation and download the pre-training model.
### 4.1. English data set
Download document analysis data set [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/)(Dataset 96G),contains 5 classes:`{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}`
```
# Download data
wget https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/publaynet.tar.gz
# Decompress data
tar -xvf publaynet.tar.gz
```
Uncompressed **directory structure:**
```
|-publaynet
|- test
|- PMC1277013_00004.jpg
|- PMC1291385_00002.jpg
| ...
|- train.json
|- train
|- PMC1291385_00002.jpg
|- PMC1277013_00004.jpg
| ...
|- val.json
|- val
|- PMC538274_00004.jpg
|- PMC539300_00004.jpg
| ...
```
**data distribution:**
| File or Folder | Description | num |
| :------------- | :------------- | ------- |
| `train/` | Training set pictures | 335,703 |
| `val/` | Verification set pictures | 11,245 |
| `test/` | Test set pictures | 11,405 |
| `train.json` | Training set annotation files | - |
| `val.json` | Validation set dimension files | - |
**Data Annotation**
The JSON file contains the annotations of all images, and the data is stored in a dictionary nested manner.Contains the following keys:
- info,represents the dimension file info。
- licenses,represents the dimension file licenses。
- images,represents the list of image information in the annotation file,each element is the information of an image。The information of one of the images is as follows:
```
{
'file_name': 'PMC4055390_00006.jpg', # file_name
'height': 601, # image height
'width': 792, # image width
'id': 341427 # image id
}
```
- annotations, represents the list of annotation information of the target object in the annotation file,each element is the annotation information of a target object。The following is the annotation information of one of the target objects:
```
{
'segmentation': # Segmentation annotation of objects
'area': 60518.099043117836, # Area of object
'iscrowd': 0, # iscrowd
'image_id': 341427, # image id
'bbox': [50.58, 490.86, 240.15, 252.16], # bbox [x1,y1,w,h]
'category_id': 1, # category_id
'id': 3322348 # image id
}
```
### 4.2. More datasets
We provide CDLA(Chinese layout analysis), TableBank(Table layout analysis)etc. data set download links,process to the JSON format of the above annotation file,that is, the training can be conducted in the same way。
| dataset | 简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [cTDaR2019_cTDaR](https://cndplab-founder.github.io/cTDaR2019/) | For form detection (TRACKA) and form identification (TRACKB).Image types include historical data sets (beginning with cTDaR_t0, such as CTDAR_T00872.jpg) and modern data sets (beginning with cTDaR_t1, CTDAR_T10482.jpg). |
| [IIIT-AR-13K](http://cvit.iiit.ac.in/usodi/iiitar13k.php) | Data sets constructed by manually annotating figures or pages from publicly available annual reports, containing 5 categories:table, figure, natural image, logo, and signature. |
| [TableBank](https://github.com/doc-analysis/TableBank) | For table detection and recognition of large datasets, including Word and Latex document formats |
| [CDLA](https://github.com/buptlihang/CDLA) | Chinese document layout analysis data set, for Chinese literature (paper) scenarios, including 10 categories:Text, Title, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation |
| [DocBank](https://github.com/doc-analysis/DocBank) | Large-scale dataset (500K document pages) constructed using weakly supervised methods for document layout analysis, containing 12 categories:Author, Caption, Date, Equation, Figure, Footer, List, Paragraph, Reference, Section, Table, Title |
## 5. Start training
Training scripts, evaluation scripts, and prediction scripts are provided, and the PubLayNet pre-training model is used as an example in this section.
If you do not want training and directly experience the following process of model evaluation, prediction, motion to static, and inference, you can download the provided pre-trained model (PubLayNet dataset) and skip this part.
```
mkdir pretrained_model
cd pretrained_model
# Download PubLayNet pre-training model(Direct experience model evaluates, predicts, and turns static)
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout.pdparams
# Download the PubLaynet inference model(Direct experience model reasoning)
wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar
```
If th
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
在当今信息化时代,光学字符识别(OCR)技术在许多领域得到了广泛应用,尤其是在数字化文档和自动化数据处理方面。随着对快速、准确的文本识别需求的增加,越来越多的开发者和企业开始寻求轻量级且高效的OCR解决方案。在这一背景下,我们推出了一款仅8.6MB的超轻量级中文OCR工具库,旨在为开发者提供便捷、实用的文本识别能力。 该OCR工具库的突出特点在于其单模型的多功能性。它不仅支持中文,还能够识别英文和数字的组合文本。这意味着开发者可以在同一个模型中处理多语言文本,从而大大简化了应用开发的复杂性。此外,针对中文的竖排文本识别需求,工具库也提供了专门的支持,这在许多东亚文化中显得尤为重要,特别是在处理书法、古籍等传统文献时,竖排文本的处理能力能够显著提升文献的数字化进程。 在识别能力方面,该OCR工具库可处理长文本的识别任务,这对于需要分析和处理大量信息的场景尤为重要。例如,学术研究、法律文件及其他各类需要批量文档处理的行业都可以从中受益。长文本识别的能力确保了开发者能够准确提取出关键信息,提升工作效率。 为了满足不同场景下的需求,该工具库支持多种文本检测和识别的训练算法。这不仅意味着开
资源推荐
资源详情
资源评论






















收起资源包目录





































































































共 2000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 20
资源评论


传奇开心果编程
- 粉丝: 1w+
- 资源: 454

下载权益

C知道特权

VIP文章

课程特权

开通VIP
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- TVP5150/TVP5151数字视频解码器硬件与软件设计方案及FAQ
- 西门子PLC与C#上位机高效通讯:WPF界面开发实践与S7netpuls库的自定义封装,西门子PLC与C#上位机高效通讯:WPF界面开发实践与S7netpuls库的自定义封装新方法WriteReadC
- 基于A*算法的机器人路径规划系统:无缝切换五种地图,详细代码注释辅助理解,基于A*算法的机器人路径规划系统:五种地图自由切换与详细代码注释指引,基于A*算法的机器人路径规划 五种地图随意切, 内涵详细
- 全差分运放的设计与应用:简化实现和性能优势
- 差动放大器性能优化方法及其应用场景的技术探讨
- (源码)基于Java的LeetCode题解项目.zip
- (源码)基于Python的微信智能机器人.zip
- 自动化所考博真题-数学-算法-英语2025.pdf
- navicat连接MySQL的神器
- (源码)基于Python的动态掩码生成工具.zip
- 永磁同步电机无传感器控制及滑膜观测模型Matlab实现,附反正切观测模型对比及参考文献,永磁同步电机无传感器控制及滑膜观测模型Matlab实现与反正切观测模型对比研究参考文献分享,永磁同步电机无传感器
- 电流反馈(CFB)与电压反馈(VFB)运算放大器的工作原理及应用场景对比
- bp神经网络python代码.py
- python爱心代码高级.py
- python爱心代码高级粒子.py
- python烟花代码.py
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制
