:boom: **Good news! Our new work [DocTr++: Deep Unrestricted Document Image Rectification](https://github.com/fh2019ustc/DocTr-Plus) comes out, capable of rectifying various distorted document images in the wild.**
:boom: **Good news! Our new work exhibits state-of-the-art performances on the [DocUNet Benchmark](https://www3.cs.stonybrook.edu/~cvl/docunet.html) dataset:
[DocScanner: Robust Document Image Rectification with Progressive Learning](https://drive.google.com/file/d/1mmCUj90rHyuO1SmpLt361youh-07Y0sD/view?usp=share_link)** with [Repo](https://github.com/fh2019ustc/DocScanner).
:boom: **Good news! A comprehensive list of [Awesome Document Image Rectification](https://github.com/fh2019ustc/Awesome-Document-Image-Rectification) methods is available.**
# DocTr
![1](https://user-images.githubusercontent.com/50725551/144743905-2b81e3ab-f2f7-4eee-aa87-f37b740f6998.png)
![2](https://user-images.githubusercontent.com/50725551/144743916-2c0762d0-727f-4d9c-afb2-3161dbaea47a.png)
![3](https://user-images.githubusercontent.com/50725551/144743919-1ff821f1-f2b1-441b-a442-f29e05d08326.png)
> [DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction](https://arxiv.org/pdf/2110.12942v2.pdf)
> ACM MM 2021 Oral
Any questions or discussions are welcomed!
## Training
DocTr consists of two main components: a geometric unwarping transformer (GeoTr) and an illumination correction transformer (IllTr).
- For geometric unwarping, we train the GeoTr network using the [Doc3D](https://github.com/fh2019ustc/doc3D-dataset) and [DTD](https://www.robots.ox.ac.uk/~vgg/data/dtd/) dataset.
- For illumination correction, we train the IllTr network based on the [DocProj](https://github.com/xiaoyu258/DocProj) dataset.
## Inference
1. Download the pretrained models from [Google Drive](https://drive.google.com/drive/folders/1eZRxnRVpf5iy3VJakJNTKWw5Zk9g-F_0?usp=sharing) or [Baidu Cloud](https://pan.baidu.com/s/1Cq9bfyAJ9MWwxj0CarqmKw?pwd=jmy1), and put them to `$ROOT/model_pretrained/`.
2. Put the distorted images in `$ROOT/distorted/`.
3. Geometric unwarping. The rectified images are saved in `$ROOT/geo_rec/` by default.
```
python inference.py
```
4. Geometric unwarping and illumination rectification. The rectified images are saved in `$ROOT/ill_rec/` by default.
```
python inference.py --ill_rec True
```
## Evaluation
- ***Important.*** In the [DocUNet Benchmark](https://www3.cs.stonybrook.edu/~cvl/docunet.html), the '64_1.png' and '64_2.png' distorted images are rotated by 180 degrees, which do not match the GT documents. It is ingored by most of existing works. Before the evaluation, please make a check.
- Note that the performances in our MM paper are computed with the two ***mistaken*** samples in [DocUNet Benchmark](https://www3.cs.stonybrook.edu/~cvl/docunet.html). For reproducing the following quantitative performance on the ***corrected*** [DocUNet Benchmark](https://www3.cs.stonybrook.edu/~cvl/docunet.html), please use the geometric rectified images available from [Google Drive](https://drive.google.com/drive/folders/1kJ34Nk18RVPwYK8mdfcQvU_67whD9tMe?usp=sharing) or [Baidu Cloud](https://pan.baidu.com/s/1Cq9bfyAJ9MWwxj0CarqmKw?pwd=jmy1). For the ***corrected*** performance of [other methods](https://github.com/fh2019ustc/Awesome-Document-Image-Rectification), please refer to our new work [DocScanner](https://drive.google.com/file/d/1mmCUj90rHyuO1SmpLt361youh-07Y0sD/view?usp=share_link).
- ***Image Metrics:*** We use the same evaluation code for MS-SSIM and LD as [DocUNet Benchmark](https://www3.cs.stonybrook.edu/~cvl/docunet.html) dataset based on Matlab 2019a. Please compare the scores according to your Matlab version. We provide our Matlab interface file at ```$ROOT/ssim_ld_eval.m```.
- ***OCR Metrics:*** The index of 30 document (60 images) of [DocUNet Benchmark](https://www3.cs.stonybrook.edu/~cvl/docunet.html) used for our OCR evaluation is ```$ROOT/ocr_img.txt``` (*Setting 1*). Please refer to [DewarpNet](https://github.com/cvlab-stonybrook/DewarpNet) for the index of 25 document (50 images) of [DocUNet Benchmark](https://www3.cs.stonybrook.edu/~cvl/docunet.html) used for their OCR evaluation (*Setting 2*). We provide the OCR evaluation code at ```$ROOT/OCR_eval.py```. The version of pytesseract is 0.3.8, and the version of [Tesseract](https://digi.bib.uni-mannheim.de/tesseract/) in Windows is recent 5.0.1.20220118.
Note that in different operating systems, the calculated performance has slight differences.
| Method | MS-SSIM | LD | ED (*Setting 1*) | CER | ED (*Setting 2*) | CER |
|:----------------:|:------------:|:--------------:| :-------:|:--------------:|:-------:|:--------------:|
| GeoTr | 0.5105 | 7.76 | 464.83 | 0.1746 | 724.84 | 0.1832 |
## Citation
If you find this code useful for your research, please use the following BibTeX entry.
```
@inproceedings{feng2021doctr,
title={DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction},
author={Feng, Hao and Wang, Yuechen and Zhou, Wengang and Deng, Jiajun and Li, Houqiang},
booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
pages={273--281},
year={2021}
}
```
```
@article{feng2021docscanner,
title={DocScanner: Robust Document Image Rectification with Progressive Learning},
author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Tian, Qi and Li, Houqiang},
journal={arXiv preprint arXiv:2110.14968},
year={2021}
}
```
## Acknowledgement
The codes are largely based on [DocUNet](https://www3.cs.stonybrook.edu/~cvl/docunet.html), [DewarpNet](https://github.com/cvlab-stonybrook/DewarpNet), and [DocProj](https://github.com/xiaoyu258/DocProj). Thanks for their wonderful works.
## Contact
For commercial usage, please contact Prof. Wengang Zhou and Prof. Houqiang Li via ([zhwg@ustc.edu.cn](zhwg@ustc.edu.cn)) and ([lihq@ustc.edu.cn](lihq@ustc.edu.cn)) .
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
基于人脸识别的身份证件验证系统.zip (51个子文件)
基于人脸识别的身份证件验证系统
api_app.py 4KB
app.py 7KB
code_files
pipelines.py 5KB
personal_image
personal_image_validation.py 11KB
__pycache__
personal_image_validation.cpython-310.pyc 6KB
pre_process
docTr_updated
geo_rec
img_geo.png 1004KB
extractor.py 4KB
IllTr.py 10KB
position_encoding.py 4KB
ssim_ld_eval.m 1KB
LICENSE 1KB
OCR_eval.py 2KB
inference.py 5KB
requirements.txt 132B
ocr_img.txt 1KB
inference_ill.py 4KB
distorted
img.png 1.03MB
__pycache__
position_encoding.cpython-310.pyc 4KB
extractor.cpython-310.pyc 3KB
inference.cpython-310.pyc 4KB
IllTr.cpython-310.pyc 9KB
seg.cpython-310.pyc 12KB
GeoTr.cpython-310.pyc 9KB
inference_ill.cpython-310.pyc 4KB
seg.py 17KB
README.md 6KB
GeoTr.py 10KB
__pycache__
pre_process.cpython-310.pyc 6KB
pre_process.py 7KB
military_certificate
classification
CNN_MC_Classifier.h5 134B
__pycache__
MC_Classification.cpython-310.pyc 715B
MC_Classification.py 710B
ocr
m_certificate_ocr.py 1KB
__pycache__
m_certificate_ocr.cpython-310.pyc 1KB
graduation_certificate
classification
BC_Classifier.py 715B
__pycache__
BC_Classifier.cpython-310.pyc 722B
ocr
g_certificate_ocr.py 3KB
__pycache__
g_certificate_ocr.cpython-310.pyc 2KB
id
classification
CNN_ID_Classifier.h5 134B
Egyption_ID.jpg 207KB
id_classification.py 2KB
__pycache__
id_classification.cpython-310.pyc 1KB
ocr
id_ocr.py 15KB
best_localize.rar 5.39MB
best_recognize.rar 133B
__pycache__
id_ocr.cpython-310.pyc 12KB
__pycache__
pipelines.cpython-310.pyc 2KB
requirements.txt 652B
errors_enum.txt 1KB
Full_Pipeline.ipynb 583KB
README.md 2KB
共 51 条
- 1
资源评论
小码蚁.
- 粉丝: 2553
- 资源: 4194
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功