# DocTR: Document Text Recognition
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) ![Build Status](https://github.com/mindee/doctr/workflows/python-package/badge.svg) [![codecov](https://codecov.io/gh/mindee/doctr/branch/main/graph/badge.svg?token=577MO567NM)](https://codecov.io/gh/mindee/doctr) [![CodeFactor](https://www.codefactor.io/repository/github/mindee/doctr/badge?s=bae07db86bb079ce9d6542315b8c6e70fa708a7e)](https://www.codefactor.io/repository/github/mindee/doctr) [![Codacy Badge](https://api.codacy.com/project/badge/Grade/340a76749b634586a498e1c0ab998f08)](https://app.codacy.com/gh/mindee/doctr?utm_source=github.com&utm_medium=referral&utm_content=mindee/doctr&utm_campaign=Badge_Grade) [![Doc Status](https://github.com/mindee/doctr/workflows/doc-status/badge.svg)](https://mindee.github.io/doctr) [![Pypi](https://img.shields.io/badge/pypi-v0.1.0-blue.svg)](https://pypi.org/project/python-doctr/)
Extract valuable information from your documents.
## Table of Contents
* [Getting Started](#getting-started)
* [Prerequisites](#prerequisites)
* [Installation](#installation)
* [Usage](#usage)
* [Python package](#python-package)
* [Docker container](#docker-container)
* [Example script](#example-script)
* [Demo app](#demo-app)
* [Documentation](#documentation)
* [Contributing](#contributing)
* [License](#license)
## Getting started
### Prerequisites
- Python 3.6 (or higher)
- [pip](https://pip.pypa.io/en/stable/)
### Installation
You can install the latest release of the package using [pypi](https://pypi.org/project/python-doctr/) as follows:
```shell
pip install python-doctr
```
Or you can install it from source:
```shell
git clone https://github.com/mindee/doctr.git
pip install -e doctr/.
```
## Usage
### Python package
You can use the library like any other python package to analyze your documents as follows:
```python
from doctr.documents import read_pdf, read_img
from doctr.models import ocr_db_crnn_vgg
model = ocr_db_crnn_vgg(pretrained=True)
# PDF
doc = read_pdf("path/to/your/doc.pdf")
result = model([doc])
# Image
page = read_img("path/to/your/img.jpg")
result = model([[page]])
# Export
json_output = result[0].export()
```
For an exhaustive list of pretrained models available, please refer to the [documentation](https://mindee.github.io/doctr/models.html).
### Docker container
If you are to deploy containerized environments, you can use the provided Dockerfile to build a docker image:
```shell
docker build . -t <YOUR_IMAGE_TAG>
```
### Example script
An example script is provided for a simple documentation analysis of a PDF file:
```shell
python scripts/analyze.py path/to/your/doc.pdf
```
All script arguments can be checked using `python scripts/analyze.py --help`
### Demo app
A minimal demo app is provided for you to play with the text detection model!
You will need an extra dependency ([Streamlit](https://streamlit.io/)) for the app to run:
```shell
pip install -r demo/requirements.txt
```
You can then easily run your app in your default browser by running:
```shell
streamlit run demo/app.py
```
## Documentation
The full package documentation is available [here](https://mindee.github.io/doctr/) for detailed specifications. The documentation was built with [Sphinx](https://www.sphinx-doc.org/) using a [theme](github.com/readthedocs/sphinx_rtd_theme) provided by [Read the Docs](https://readthedocs.org/).
## Contributing
Please refer to `CONTRIBUTING` if you wish to contribute to this project.
## License
Distributed under the Apache 2.0 License. See `LICENSE` for more information.
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
共59个文件
py:49个
txt:4个
pkg-info:2个
资源分类:Python库 所属语言:Python 资源全名:python-doctr-0.1.1.tar.gz 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059
资源推荐
资源详情
资源评论
收起资源包目录
python-doctr-0.1.1.tar.gz (59个子文件)
python-doctr-0.1.1
PKG-INFO 6KB
test
test_models_recognition.py 5KB
test_utils_metrics.py 3KB
test_utils_visualization.py 331B
test_documents_elements.py 6KB
test_models_export.py 1KB
test_utils_geometry.py 420B
test_models_detection.py 5KB
test_datasets_utils.py 1KB
test_models_utils.py 2KB
test_datasets.py 539B
test_core.py 85B
test_models.py 6KB
test_documents_reader.py 2KB
python_doctr.egg-info
PKG-INFO 6KB
requires.txt 163B
SOURCES.txt 2KB
top_level.txt 6B
dependency_links.txt 1B
zip-safe 1B
LICENSE 11KB
doctr
models
resnet.py 6KB
detection
core.py 4KB
zoo.py 2KB
differentiable_binarization.py 13KB
__init__.py 82B
core.py 7KB
utils.py 6KB
zoo.py 3KB
__init__.py 224B
export.py 3KB
vgg.py 3KB
_utils.py 1KB
recognition
sar.py 15KB
core.py 5KB
postprocessor.py 2KB
zoo.py 4KB
__init__.py 107B
crnn.py 6KB
preprocessor.py 3KB
__init__.py 86B
documents
elements.py 7KB
__init__.py 46B
reader.py 4KB
utils
metrics.py 10KB
__init__.py 75B
visualization.py 4KB
geometry.py 791B
common_types.py 393B
repr.py 2KB
version.py 22B
datasets
vocabs.py 655B
core.py 2KB
utils.py 3KB
funsd.py 2KB
__init__.py 64B
setup.cfg 102B
setup.py 3KB
README.md 4KB
共 59 条
- 1
资源评论
挣扎的蓝藻
- 粉丝: 13w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功