OfficialImplementationofOCR-freeDocumentUnderstandingTrans资源-CSDN文库

共66个文件

py：19个

jpg：14个

yaml：8个

版权申诉

transformer

131 浏览量 2024-08-13 09:49:23 上传评论收藏 62.73MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

donut-master.zip （66个子文件）

donut-master

misc

overview.png 669KB

sample_image_donut_document.png 739KB

sample_synthdog.png 1.37MB

screenshot_gradio_demos.png 1.33MB

sample_image_cord_test_receipt_00004.png 1.57MB

setup.py 2KB

donut

__init__.py 313B

util.py 12KB

_version.py 87B

model.py 25KB

app.py 2KB

LICENSE 1KB

synthdog

resources

corpus

enwiki.txt 2.54MB

jawiki.txt 1.66MB

kowiki.txt 1.01MB

zhwiki.txt 1.46MB

font

NotoSansJP-Regular.otf 4.34MB

NotoSerifJP-Regular.otf 5.88MB

NotoSerif-Regular.ttf 366KB

NotoSans-Regular.ttf 390KB

NotoSansSC-Regular.otf 8.09MB

NotoSerifSC-Regular.otf 10.7MB

NotoSerifKR-Regular.otf 7.09MB

NotoSansKR-Regular.otf 4.52MB

paper

paper_6.jpg 1.62MB

paper_1.jpg 2.27MB

paper_4.jpg 1.83MB

paper_3.jpg 2.4MB

paper_2.jpg 1.8MB

paper_5.jpg 3.2MB

background

coffee_122.jpg 57KB

cream_124.jpg 2.14MB

crater_141.jpg 1.73MB

hiking_18.jpg 503KB

eagle_110.jpg 216KB

bedroom_83.jpg 70KB

bob+dylan_83.jpg 409KB

coffee_18.jpeg 1.7MB

farm_25.jpg 688KB

config_en.yaml 2KB

config_ja.yaml 2KB

template.py 5KB

config_zh.yaml 2KB

config_ko.yaml 2KB

layouts

__init__.py 169B

grid.py 2KB

grid_stack.py 2KB

README.md 2KB

elements

__init__.py 323B

content.py 3KB

document.py 2KB

background.py 608B

textbox.py 1KB

paper.py 391B

dataset

.gitkeep 1B

lightning_module.py 8KB

.gitignore 2KB

train.py 6KB

test.py 3KB

README.md 18KB

config

train_cord.yaml 940B

train_zhtrainticket.yaml 578B

train_docvqa.yaml 647B

train_rvlcdip.yaml 659B

result

.gitkeep 1B

NOTICE 9KB

<div align="center"> # Donut ð© : Document Understanding Transformer [![Paper](https://img.shields.io/badge/Paper-arxiv.2111.15664-red)](https://arxiv.org/abs/2111.15664) [![Conference](https://img.shields.io/badge/ECCV-2022-blue)](#how-to-cite) [![Demo](https://img.shields.io/badge/Demo-Gradio-brightgreen)](#demo) [![Demo](https://img.shields.io/badge/Demo-Colab-orange)](#demo) [![PyPI](https://img.shields.io/pypi/v/donut-python?color=green&label=pip%20install%20donut-python)](https://pypi.org/project/donut-python) [![Downloads](https://static.pepy.tech/personalized-badge/donut-python?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=Downloads)](https://pepy.tech/project/donut-python) Official Implementation of Donut and SynthDoG | [Paper](https://arxiv.org/abs/2111.15664) | [Slide](https://docs.google.com/presentation/d/1gv3A7t4xpwwNdpxV_yeHzEOMy-exJCAz6AlAI9O5fS8/edit?usp=sharing) | [Poster](https://docs.google.com/presentation/d/1m1f8BbAm5vxPcqynn_MbFfmQAlHQIR5G72-hQUFS2sk/edit?usp=sharing) </div> ## Introduction **Donut** ð©, **Do**cume**n**t **u**nderstanding **t**ransformer, is a new method of document understanding that utilizes an OCR-free end-to-end Transformer model. Donut does not require off-the-shelf OCR engines/APIs, yet it shows state-of-the-art performances on various visual document understanding tasks, such as visual document classification or information extraction (a.k.a. document parsing). In addition, we present **SynthDoG** ð¶, **Synth**etic **Do**cument **G**enerator, that helps the model pre-training to be flexible on various languages and domains. Our academic paper, which describes our method in detail and provides full experimental results and analyses, can be found here:<br> > [**OCR-free Document Understanding Transformer**](https://arxiv.org/abs/2111.15664).<br> > [Geewook Kim](https://geewook.kim), [Teakgyu Hong](https://dblp.org/pid/183/0952.html), [Moonbin Yim](https://github.com/moonbings), [JeongYeon Nam](https://github.com/long8v), [Jinyoung Park](https://github.com/jyp1111), [Jinyeong Yim](https://jinyeong.github.io), [Wonseok Hwang](https://scholar.google.com/citations?user=M13_WdcAAAAJ), [Sangdoo Yun](https://sangdooyun.github.io), [Dongyoon Han](https://dongyoonhan.github.io), [Seunghyun Park](https://scholar.google.com/citations?user=iowjmTwAAAAJ). In ECCV 2022. <img width="946" alt="image" src="misc/overview.png"> ## Pre-trained Models and Web Demos Gradio web demos are available! [![Demo](https://img.shields.io/badge/Demo-Gradio-brightgreen)](#demo) [![Demo](https://img.shields.io/badge/Demo-Colab-orange)](#demo) |:--:| |![image](misc/screenshot_gradio_demos.png)| - You can run the demo with `./app.py` file. - Sample images are available at `./misc` and more receipt images are available at [CORD dataset link](https://huggingface.co/datasets/naver-clova-ix/cord-v2). - Web demos are available from the links in the following table. - Note: We have updated the Google Colab demo (as of June 15, 2023) to ensure its proper working. |Task|Sec/Img|Score|Trained Model|<div id="demo">Demo</div>| |---|---|---|---|---| | [CORD](https://github.com/clovaai/cord) (Document Parsing) | 0.7 /<br> 0.7 /<br> 1.2 | 91.3 /<br> 91.1 /<br> 90.9 | [donut-base-finetuned-cord-v2](https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2/tree/official) (1280) /<br> [donut-base-finetuned-cord-v1](https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v1/tree/official) (1280) /<br> [donut-base-finetuned-cord-v1-2560](https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v1-2560/tree/official) | [gradio space web demo](https://huggingface.co/spaces/naver-clova-ix/donut-base-finetuned-cord-v2),<br>[google colab demo (updated at 23.06.15)](https://colab.research.google.com/drive/1NMSqoIZ_l39wyRD7yVjw2FIuU2aglzJi?usp=sharing) | | [Train Ticket](https://github.com/beacandler/EATEN) (Document Parsing) | 0.6 | 98.7 | [donut-base-finetuned-zhtrainticket](https://huggingface.co/naver-clova-ix/donut-base-finetuned-zhtrainticket/tree/official) | [google colab demo (updated at 23.06.15)](https://colab.research.google.com/drive/1YJBjllahdqNktXaBlq5ugPh1BCm8OsxI?usp=sharing) | | [RVL-CDIP](https://www.cs.cmu.edu/~aharley/rvl-cdip) (Document Classification) | 0.75 | 95.3 | [donut-base-finetuned-rvlcdip](https://huggingface.co/naver-clova-ix/donut-base-finetuned-rvlcdip/tree/official) | [gradio space web demo](https://huggingface.co/spaces/nielsr/donut-rvlcdip),<br>[google colab demo (updated at 23.06.15)](https://colab.research.google.com/drive/1iWOZHvao1W5xva53upcri5V6oaWT-P0O?usp=sharing) | | [DocVQA Task1](https://rrc.cvc.uab.es/?ch=17) (Document VQA) | 0.78 | 67.5 | [donut-base-finetuned-docvqa](https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa/tree/official) | [gradio space web demo](https://huggingface.co/spaces/nielsr/donut-docvqa),<br>[google colab demo (updated at 23.06.15)](https://colab.research.google.com/drive/1oKieslZCulFiquequ62eMGc-ZWgay4X3?usp=sharing) | The links to the pre-trained backbones are here: - [`donut-base`](https://huggingface.co/naver-clova-ix/donut-base/tree/official): trained with 64 A100 GPUs (~2.5 days), number of layers (encoder: {2,2,14,2}, decoder: 4), input size 2560x1920, swin window size 10, IIT-CDIP (11M) and SynthDoG (English, Chinese, Japanese, Korean, 0.5M x 4). - [`donut-proto`](https://huggingface.co/naver-clova-ix/donut-proto/tree/official): (preliminary model) trained with 8 V100 GPUs (~5 days), number of layers (encoder: {2,2,18,2}, decoder: 4), input size 2048x1536, swin window size 8, and SynthDoG (English, Japanese, Korean, 0.4M x 3). Please see [our paper](#how-to-cite) for more details. ## SynthDoG datasets ![image](misc/sample_synthdog.png) The links to the SynthDoG-generated datasets are here: - [`synthdog-en`](https://huggingface.co/datasets/naver-clova-ix/synthdog-en): English, 0.5M. - [`synthdog-zh`](https://huggingface.co/datasets/naver-clova-ix/synthdog-zh): Chinese, 0.5M. - [`synthdog-ja`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ja): Japanese, 0.5M. - [`synthdog-ko`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ko): Korean, 0.5M. To generate synthetic datasets with our SynthDoG, please see `./synthdog/README.md` and [our paper](#how-to-cite) for details. ## Updates **_2023-06-15_** We have updated all Google Colab demos to ensure its proper working.<br> **_2022-11-14_** New version 1.0.9 is released (`pip install donut-python --upgrade`). See [1.0.9 Release Notes](https://github.com/clovaai/donut/releases/tag/1.0.9).<br> **_2022-08-12_** Donut ð© is also available at [huggingface/transformers ð¤](https://huggingface.co/docs/transformers/main/en/model_doc/donut) (contributed by [@NielsRogge](https://github.com/NielsRogge)). `donut-python` loads the pre-trained weights from the `official` branch of the model repositories. See [1.0.5 Release Notes](https://github.com/clovaai/donut/releases/tag/1.0.5).<br> **_2022-08-05_** A well-executed hands-on tutorial on donut ð© is published at [Towards Data Science](https://towardsdatascience.com/ocr-free-document-understanding-with-donut-1acfbdf099be) (written by [@estaudere](https://github.com/estaudere)).<br> **_2022-07-20_** First Commit, We release our code, model weights, synthetic data and generator. ## Software installation [![PyPI](https://img.shields.io/pypi/v/donut-python?color=green&label=pip%20install%20donut-python)](https://pypi.org/project/donut-python) [![Downloads](https://static.pepy.tech/personalized-badge/donut-python?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=Downloads)](https://pepy.tech/project/donut-python) ```bash pip install donut-python ``` or clone this repository and install the dependencies: ```bash git clone https://github.com/clovaai/donut.git cd donut/ conda create -n donut_official p

评论收藏

内容反馈

版权申诉