<div align="center">
# Donut ð© : Document Understanding Transformer
[![Paper](https://img.shields.io/badge/Paper-arxiv.2111.15664-red)](https://arxiv.org/abs/2111.15664)
[![Conference](https://img.shields.io/badge/ECCV-2022-blue)](#how-to-cite)
[![Demo](https://img.shields.io/badge/Demo-Gradio-brightgreen)](#demo)
[![Demo](https://img.shields.io/badge/Demo-Colab-orange)](#demo)
[![PyPI](https://img.shields.io/pypi/v/donut-python?color=green&label=pip%20install%20donut-python)](https://pypi.org/project/donut-python)
[![Downloads](https://static.pepy.tech/personalized-badge/donut-python?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=Downloads)](https://pepy.tech/project/donut-python)
Official Implementation of Donut and SynthDoG | [Paper](https://arxiv.org/abs/2111.15664) | [Slide](https://docs.google.com/presentation/d/1gv3A7t4xpwwNdpxV_yeHzEOMy-exJCAz6AlAI9O5fS8/edit?usp=sharing) | [Poster](https://docs.google.com/presentation/d/1m1f8BbAm5vxPcqynn_MbFfmQAlHQIR5G72-hQUFS2sk/edit?usp=sharing)
</div>
## Introduction
**Donut** ð©, **Do**cume**n**t **u**nderstanding **t**ransformer, is a new method of document understanding that utilizes an OCR-free end-to-end Transformer model. Donut does not require off-the-shelf OCR engines/APIs, yet it shows state-of-the-art performances on various visual document understanding tasks, such as visual document classification or information extraction (a.k.a. document parsing).
In addition, we present **SynthDoG** ð¶, **Synth**etic **Do**cument **G**enerator, that helps the model pre-training to be flexible on various languages and domains.
Our academic paper, which describes our method in detail and provides full experimental results and analyses, can be found here:<br>
> [**OCR-free Document Understanding Transformer**](https://arxiv.org/abs/2111.15664).<br>
> [Geewook Kim](https://geewook.kim), [Teakgyu Hong](https://dblp.org/pid/183/0952.html), [Moonbin Yim](https://github.com/moonbings), [JeongYeon Nam](https://github.com/long8v), [Jinyoung Park](https://github.com/jyp1111), [Jinyeong Yim](https://jinyeong.github.io), [Wonseok Hwang](https://scholar.google.com/citations?user=M13_WdcAAAAJ), [Sangdoo Yun](https://sangdooyun.github.io), [Dongyoon Han](https://dongyoonhan.github.io), [Seunghyun Park](https://scholar.google.com/citations?user=iowjmTwAAAAJ). In ECCV 2022.
<img width="946" alt="image" src="misc/overview.png">
## Pre-trained Models and Web Demos
Gradio web demos are available! [![Demo](https://img.shields.io/badge/Demo-Gradio-brightgreen)](#demo) [![Demo](https://img.shields.io/badge/Demo-Colab-orange)](#demo)
|:--:|
|![image](misc/screenshot_gradio_demos.png)|
- You can run the demo with `./app.py` file.
- Sample images are available at `./misc` and more receipt images are available at [CORD dataset link](https://huggingface.co/datasets/naver-clova-ix/cord-v2).
- Web demos are available from the links in the following table.
- Note: We have updated the Google Colab demo (as of June 15, 2023) to ensure its proper working.
|Task|Sec/Img|Score|Trained Model|<div id="demo">Demo</div>|
|---|---|---|---|---|
| [CORD](https://github.com/clovaai/cord) (Document Parsing) | 0.7 /<br> 0.7 /<br> 1.2 | 91.3 /<br> 91.1 /<br> 90.9 | [donut-base-finetuned-cord-v2](https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2/tree/official) (1280) /<br> [donut-base-finetuned-cord-v1](https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v1/tree/official) (1280) /<br> [donut-base-finetuned-cord-v1-2560](https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v1-2560/tree/official) | [gradio space web demo](https://huggingface.co/spaces/naver-clova-ix/donut-base-finetuned-cord-v2),<br>[google colab demo (updated at 23.06.15)](https://colab.research.google.com/drive/1NMSqoIZ_l39wyRD7yVjw2FIuU2aglzJi?usp=sharing) |
| [Train Ticket](https://github.com/beacandler/EATEN) (Document Parsing) | 0.6 | 98.7 | [donut-base-finetuned-zhtrainticket](https://huggingface.co/naver-clova-ix/donut-base-finetuned-zhtrainticket/tree/official) | [google colab demo (updated at 23.06.15)](https://colab.research.google.com/drive/1YJBjllahdqNktXaBlq5ugPh1BCm8OsxI?usp=sharing) |
| [RVL-CDIP](https://www.cs.cmu.edu/~aharley/rvl-cdip) (Document Classification) | 0.75 | 95.3 | [donut-base-finetuned-rvlcdip](https://huggingface.co/naver-clova-ix/donut-base-finetuned-rvlcdip/tree/official) | [gradio space web demo](https://huggingface.co/spaces/nielsr/donut-rvlcdip),<br>[google colab demo (updated at 23.06.15)](https://colab.research.google.com/drive/1iWOZHvao1W5xva53upcri5V6oaWT-P0O?usp=sharing) |
| [DocVQA Task1](https://rrc.cvc.uab.es/?ch=17) (Document VQA) | 0.78 | 67.5 | [donut-base-finetuned-docvqa](https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa/tree/official) | [gradio space web demo](https://huggingface.co/spaces/nielsr/donut-docvqa),<br>[google colab demo (updated at 23.06.15)](https://colab.research.google.com/drive/1oKieslZCulFiquequ62eMGc-ZWgay4X3?usp=sharing) |
The links to the pre-trained backbones are here:
- [`donut-base`](https://huggingface.co/naver-clova-ix/donut-base/tree/official): trained with 64 A100 GPUs (~2.5 days), number of layers (encoder: {2,2,14,2}, decoder: 4), input size 2560x1920, swin window size 10, IIT-CDIP (11M) and SynthDoG (English, Chinese, Japanese, Korean, 0.5M x 4).
- [`donut-proto`](https://huggingface.co/naver-clova-ix/donut-proto/tree/official): (preliminary model) trained with 8 V100 GPUs (~5 days), number of layers (encoder: {2,2,18,2}, decoder: 4), input size 2048x1536, swin window size 8, and SynthDoG (English, Japanese, Korean, 0.4M x 3).
Please see [our paper](#how-to-cite) for more details.
## SynthDoG datasets
![image](misc/sample_synthdog.png)
The links to the SynthDoG-generated datasets are here:
- [`synthdog-en`](https://huggingface.co/datasets/naver-clova-ix/synthdog-en): English, 0.5M.
- [`synthdog-zh`](https://huggingface.co/datasets/naver-clova-ix/synthdog-zh): Chinese, 0.5M.
- [`synthdog-ja`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ja): Japanese, 0.5M.
- [`synthdog-ko`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ko): Korean, 0.5M.
To generate synthetic datasets with our SynthDoG, please see `./synthdog/README.md` and [our paper](#how-to-cite) for details.
## Updates
**_2023-06-15_** We have updated all Google Colab demos to ensure its proper working.<br>
**_2022-11-14_** New version 1.0.9 is released (`pip install donut-python --upgrade`). See [1.0.9 Release Notes](https://github.com/clovaai/donut/releases/tag/1.0.9).<br>
**_2022-08-12_** Donut ð© is also available at [huggingface/transformers ð¤](https://huggingface.co/docs/transformers/main/en/model_doc/donut) (contributed by [@NielsRogge](https://github.com/NielsRogge)). `donut-python` loads the pre-trained weights from the `official` branch of the model repositories. See [1.0.5 Release Notes](https://github.com/clovaai/donut/releases/tag/1.0.5).<br>
**_2022-08-05_** A well-executed hands-on tutorial on donut ð© is published at [Towards Data Science](https://towardsdatascience.com/ocr-free-document-understanding-with-donut-1acfbdf099be) (written by [@estaudere](https://github.com/estaudere)).<br>
**_2022-07-20_** First Commit, We release our code, model weights, synthetic data and generator.
## Software installation
[![PyPI](https://img.shields.io/pypi/v/donut-python?color=green&label=pip%20install%20donut-python)](https://pypi.org/project/donut-python)
[![Downloads](https://static.pepy.tech/personalized-badge/donut-python?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=Downloads)](https://pepy.tech/project/donut-python)
```bash
pip install donut-python
```
or clone this repository and install the dependencies:
```bash
git clone https://github.com/clovaai/donut.git
cd donut/
conda create -n donut_official p
没有合适的资源?快使用搜索试试~ 我知道了~
Official Implementation of OCR-free Document Understanding Trans
共66个文件
py:19个
jpg:14个
yaml:8个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 54 浏览量
2024-08-13
09:49:23
上传
评论
收藏 62.73MB ZIP 举报
温馨提示
Donut , Document understanding transformer, is a new method of document understanding that utilizes an OCR-free end-to-end Transformer model. Donut does not require off-the-shelf OCR engines/APIs, yet it shows state-of-the-art performances on various visual document understanding tasks, such as visual document classification or information extraction (a.k.a. document parsing). In addition, we present SynthDoG , Synthetic Document Generator, that helps the model pre-training to be flexible on
资源推荐
资源详情
资源评论
收起资源包目录
donut-master.zip (66个子文件)
donut-master
misc
overview.png 669KB
sample_image_donut_document.png 739KB
sample_synthdog.png 1.37MB
screenshot_gradio_demos.png 1.33MB
sample_image_cord_test_receipt_00004.png 1.57MB
setup.py 2KB
donut
__init__.py 313B
util.py 12KB
_version.py 87B
model.py 25KB
app.py 2KB
LICENSE 1KB
synthdog
resources
corpus
enwiki.txt 2.54MB
jawiki.txt 1.66MB
kowiki.txt 1.01MB
zhwiki.txt 1.46MB
font
ja
NotoSansJP-Regular.otf 4.34MB
NotoSerifJP-Regular.otf 5.88MB
en
NotoSerif-Regular.ttf 366KB
NotoSans-Regular.ttf 390KB
zh
NotoSansSC-Regular.otf 8.09MB
NotoSerifSC-Regular.otf 10.7MB
ko
NotoSerifKR-Regular.otf 7.09MB
NotoSansKR-Regular.otf 4.52MB
paper
paper_6.jpg 1.62MB
paper_1.jpg 2.27MB
paper_4.jpg 1.83MB
paper_3.jpg 2.4MB
paper_2.jpg 1.8MB
paper_5.jpg 3.2MB
background
coffee_122.jpg 57KB
cream_124.jpg 2.14MB
crater_141.jpg 1.73MB
hiking_18.jpg 503KB
eagle_110.jpg 216KB
bedroom_83.jpg 70KB
bob+dylan_83.jpg 409KB
coffee_18.jpeg 1.7MB
farm_25.jpg 688KB
config_en.yaml 2KB
config_ja.yaml 2KB
template.py 5KB
config_zh.yaml 2KB
config_ko.yaml 2KB
layouts
__init__.py 169B
grid.py 2KB
grid_stack.py 2KB
README.md 2KB
elements
__init__.py 323B
content.py 3KB
document.py 2KB
background.py 608B
textbox.py 1KB
paper.py 391B
dataset
.gitkeep 1B
lightning_module.py 8KB
.gitignore 2KB
train.py 6KB
test.py 3KB
README.md 18KB
config
train_cord.yaml 940B
train_zhtrainticket.yaml 578B
train_docvqa.yaml 647B
train_rvlcdip.yaml 659B
result
.gitkeep 1B
NOTICE 9KB
共 66 条
- 1
资源评论
UnknownToKnown
- 粉丝: 1w+
- 资源: 773
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- RabbitMQ 延时执行的功能插件
- Java数组反转技巧:保持元素原始类型与代码实现
- 【Unity纹理生成和材料编辑工具】Surforge
- 【日常办公必须工具】文件管理+批量移动文件+实用工具+软件开发+windows必备
- 基于stm32的六轴机械臂控制+openmv颜色识别-识别不同的物块分放(源码+文档说明)
- 【Unity烟雾特效插件】VFX Graph - Stylized Smoke - Vol. 1 高质量的烟雾特效
- 2019-灵活就业数据集.dta
- 2017-灵活就业数据集.dta
- 2015-灵活就业数据集.dta
- 【Unity天空背景插件】Cartoon & Stylized HDRI Sky Pack 01
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功