English | [中文](README_CN.md)
# DBNet and DBNet++
<!--- Guideline: use url linked to abstract in ArXiv instead of PDF for fast loading. -->
> DBNet: [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947)
> DBNet++: [Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion](https://arxiv.org/abs/2202.10304)
## 1. Introduction
### DBNet
DBNet is a segmentation-based scene text detection method. Segmentation-based methods are gaining popularity for scene
text detection purposes as they can more accurately describe scene text of various shapes, such as curved text.
The drawback of current segmentation-based SOTA methods is the post-processing of binarization (conversion of
probability maps into text bounding boxes) which often requires a manually set threshold (reduces prediction accuracy)
and complex algorithms for grouping pixels (resulting in a considerable time cost during inference).
To eliminate the problem described above, DBNet integrates an adaptive threshold called Differentiable Binarization(DB)
into the architecture. DB simplifies post-processing and enhances the performance of text detection.Moreover, it can be
removed in the inference stage without sacrificing performance.[[1](#references)]
<p align="center"><img alt="Figure 1. Overall DBNet architecture" src="https://user-images.githubusercontent.com/16683750/225589619-d50c506c-e903-4f59-a316-8b62586c73a9.png" width="800"/></p>
<p align="center"><em>Figure 1. Overall DBNet architecture</em></p>
The overall architecture of DBNet is presented in _Figure 1._ It consists of multiple stages:
1. Feature extraction from a backbone at different scales. ResNet-50 is used as a backbone, and features are extracted
from stages 2, 3, 4, and 5.
2. The extracted features are upscaled and summed up with the previous stage features in a cascade fashion.
3. The resulting features are upscaled once again to match the size of the largest feature map (from the stage 2) and
concatenated along the channel axis.
4. Then, the final feature map (shown in dark blue) is used to predict both the probability and threshold maps by
applying 3×3 convolutional operator and two de-convolutional operators with stride 2.
5. The probability and threshold maps are merged into one approximate binary map by the Differentiable binarization
module. The approximate binary map is used to generate text bounding boxes.
### DBNet++
DBNet++ is an extension of DBNet and thus replicates its architecture. The only difference is that instead of
concatenating extracted and scaled features from the backbone as DBNet did, DBNet++ uses an adaptive way to fuse those
features called Adaptive Scale Fusion (ASF) module (Figure 2). It improves the scale robustness of the network by
fusing features of different scales adaptively. By using ASF, DBNet++’s ability to detect text instances of diverse
scales is distinctly strengthened.[[2](#references)]
<p align="center"><img alt="Figure 2. Overall DBNet++ architecture" src="https://user-images.githubusercontent.com/16683750/236786997-13823b9c-ecaa-4bc5-8037-71299b3baffe.png" width="800"/></p>
<p align="center"><em>Figure 2. Overall DBNet++ architecture</em></p>
<p align="center"><img alt="Figure 3. Detailed architecture of the Adaptive Scale Fusion module" src="https://user-images.githubusercontent.com/16683750/236787093-c0c78d8f-e4f4-4c5e-8259-7120a14b0e31.png" width="700"/></p>
<p align="center"><em>Figure 3. Detailed architecture of the Adaptive Scale Fusion module</em></p>
ASF consists of two attention modules – stage-wise attention and spatial attention, where the latter is integrated in
the former as described in the Figure 3. The stage-wise attention module learns the weights of the feature maps of
different scales. While the spatial attention module learns the attention across the spatial dimensions. The
combination of these two modules leads to scale-robust feature fusion.
DBNet++ performs better in detecting text instances of diverse scales, especially for large-scale text instances where
DBNet may generate inaccurate or discrete bounding boxes.
## 2. General purpose models
Here we present general purpose models that were trained on wide variety of tasks (real-world photos, street views, documents, etc.) and challenges (straight texts, curved texts, long text lines, etc.) with two primary languages: Chinese and English. These models can be used right off-the-shelf in your applications or for initialization of your models.
The models were trained on 12 public datasets (CTW, LSVT, RCTW-17, TextOCR, etc.) that contain wide range of images. The training set has 153,511 images and the validation set has 9,786 images.<br/>
The test set consists of 598 images manually selected from the above-mentioned datasets.
<div align="center">
| **Model** | **Context** | **Backbone** | **Languages** | **F-score on Our Test Set** | **Throughput** | **Download** |
|-----------|----------------|--------------|-------------------|:---------------------------:|----------------|----------------------------------------------------------------------------------------------------------|
| DBNet | D910x8-MS2.0-G | ResNet-50 | Chinese + English | 83.41% | 256 img/s | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_ch_en_general-a5dbb141.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet50_ch_en_general-a5dbb141-912f0a90.mindir) |
| DBNet++ | D910x4-MS2.0-G | ResNet-50 | Chinese + English | 84.30% | 104 img/s | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_ch_en_general-884ba5b9.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnetpp_resnet50_ch_en_general-884ba5b9-b3f52398.mindir) |
</div>
> The input_shape for exported DBNet MindIR and DBNet++ MindIR in the links are `(1,3,736,1280)` and `(1,3,1152,2048)`, respectively.
## 3. Results
DBNet and DBNet++ were trained on the ICDAR2015, MSRA-TD500, SCUT-CTW1500, Total-Text, and MLT2017 datasets. In addition, we conducted pre-training on the SynthText dataset and provided a URL to download pretrained weights. All training results are as follows:
### ICDAR2015
<div align="center">
| **Model** | **Context** | **Backbone** | **Pretrained** | **Recall** | **Precision** | **F-score** | **Train T.** | **Throughput** | **Recipe** | **Download** |
|---------------------|----------------|---------------|----------------|------------|---------------|-------------|--------------|----------------|-------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DBNet | D910x1-MS2.0-G | MobileNetV3 | ImageNet | 76.31% | 78.27% | 77.28% | 10 s/epoch | 100 img/s | [yaml](db_mobilenetv3_icdar15.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_mobilenetv3-62c44539.ckpt) \| [mindir](https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_mobilenetv3-62c44539-f14c6a13.mindir) |
| DBNet | D910x8-MS2.3-G | MobileNetV3 | ImageNet | 76.22% | 77.98% | 77.09% | 1.1 s/epoch | 960 img/s | [yaml](db_mobilenetv3_icdar15_8p.yaml) | Coming soon
没有合适的资源?快使用搜索试试~ 我知道了~
MindOCR是一个基于MindSpore的OCR开发和应用开源工具箱 它可以帮助用户训练和应用最佳的文本检测和识别模型
共719个文件
py:394个
md:144个
yaml:86个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 56 浏览量
2024-02-06
11:22:57
上传
评论
收藏 13.11MB ZIP 举报
温馨提示
MindOCR是一个基于MindSpore的OCR开发和应用开源工具箱。它可以帮助用户训练和应用最佳的文本检测和识别模型,例如 DBNet/DBNet++ 和 CR
资源推荐
资源详情
资源评论
收起资源包目录
MindOCR是一个基于MindSpore的OCR开发和应用开源工具箱 它可以帮助用户训练和应用最佳的文本检测和识别模型 (719个子文件)
main.cpp 23KB
db_postprocess.cpp 10KB
rec_pre_node.cpp 9KB
module_manager.cpp 9KB
utils.cpp 8KB
rec_infer_node.cpp 8KB
collect_node.cpp 7KB
command_parser.cpp 6KB
det_post_node.cpp 6KB
det_pre_node.cpp 6KB
det_infer_node.cpp 6KB
cls_pre_node.cpp 6KB
config_parser.cpp 6KB
cls_infer_node.cpp 6KB
module_base.cpp 5KB
hand_out_node.cpp 5KB
cls_post_node.cpp 4KB
rec_post_node.cpp 3KB
rec_postprocess.cpp 3KB
profile.cpp 943B
.flake8 859B
.gitignore 2KB
blocking_queue.h 3KB
utils.h 3KB
module_manager.h 2KB
module_base.h 2KB
module_factory.h 2KB
data_type.h 2KB
db_postprocess.h 2KB
config_parser.h 2KB
command_parser.h 2KB
det_pre_node.h 1KB
cls_post_node.h 1KB
collect_node.h 1KB
rec_pre_node.h 1KB
cls_infer_node.h 1KB
rec_post_node.h 1KB
rec_infer_node.h 1KB
cls_pre_node.h 1KB
det_infer_node.h 1KB
profile.h 1KB
constant.h 1KB
rec_postprocess.h 1KB
hand_out_node.h 1KB
det_post_node.h 963B
status_code.h 536B
MANIFEST.in 66B
yolov8_structure.jpeg 1.19MB
example.jpg 1.39MB
example_ser.jpg 987KB
db_mobilenetv3_ppocrv3_param_map.json 45KB
svtr_ppocrv3_ch_param_map.json 40KB
ser_vi_layoutxlm_param_map.json 37KB
LICENSE 11KB
frequently_asked_questions.md 49KB
frequently_asked_questions.md 48KB
inference_thirdparty_quickstart.md 44KB
inference_thirdparty_quickstart.md 41KB
README.md 26KB
README_CN.md 26KB
README.md 24KB
README_CN.md 23KB
inference_quickstart.md 22KB
README.md 22KB
README_CN.md 22KB
README.md 22KB
README_CN.md 21KB
inference_quickstart.md 21KB
README.md 21KB
README.md 20KB
README_CN.md 20KB
README_CN.md 19KB
yaml_configuration.md 17KB
README.md 16KB
README_CN.md 16KB
README.md 16KB
README_CN_PP-OCRv3.md 16KB
README_CN.md 16KB
README_CN_PP-OCRv3.md 15KB
training_detection_custom_dataset.md 15KB
convert_tutorial.md 15KB
yaml_configuration.md 15KB
README_CN.md 15KB
training_recognition_custom_dataset.md 14KB
training_detection_custom_dataset.md 14KB
inference_tutorial.md 14KB
README.md 13KB
convert_tutorial.md 13KB
inference_tutorial.md 12KB
README.md 12KB
training_recognition_custom_dataset.md 12KB
model_template.md 11KB
README.md 11KB
README_CN.md 11KB
README.md 11KB
README_CN.md 11KB
README_CN.md 11KB
model_template_CN.md 10KB
README.md 9KB
distribute_train.md 8KB
共 719 条
- 1
- 2
- 3
- 4
- 5
- 6
- 8
资源评论
Java程序员-张凯
- 粉丝: 1w+
- 资源: 6718
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功