# Tesseract OCR
[![Build status](https://ci.appveyor.com/api/projects/status/miah0ikfsf0j3819/branch/master?svg=true)](https://ci.appveyor.com/project/zdenop/tesseract/)
[![Build status](https://github.com/tesseract-ocr/tesseract/workflows/sw/badge.svg)](https://github.com/tesseract-ocr/tesseract/actions/workflows/sw.yml)\
[![Coverity Scan Build Status](https://scan.coverity.com/projects/tesseract-ocr/badge.svg)](https://scan.coverity.com/projects/tesseract-ocr)
[![CodeQL](https://github.com/tesseract-ocr/tesseract/workflows/CodeQL/badge.svg)](https://github.com/tesseract-ocr/tesseract/security/code-scanning)
[![OSS-Fuzz](https://img.shields.io/badge/oss--fuzz-fuzzing-brightgreen)](https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=-opened&can=2&q=proj:tesseract-ocr)
\
[![GitHub license](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](https://raw.githubusercontent.com/tesseract-ocr/tesseract/main/LICENSE)
[![Downloads](https://img.shields.io/badge/download-all%20releases-brightgreen.svg)](https://github.com/tesseract-ocr/tesseract/releases/)
## Table of Contents
* [Tesseract OCR](#tesseract-ocr)
* [About](#about)
* [Brief history](#brief-history)
* [Installing Tesseract](#installing-tesseract)
* [Running Tesseract](#running-tesseract)
* [For developers](#for-developers)
* [Support](#support)
* [License](#license)
* [Dependencies](#dependencies)
* [Latest Version of README](#latest-version-of-readme)
## About
This package contains an **OCR engine** - `libtesseract` and a **command line program** - `tesseract`.
Tesseract 4 adds a new neural net (LSTM) based [OCR engine](https://en.wikipedia.org/wiki/Optical_character_recognition) which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0).
It also needs [traineddata](https://tesseract-ocr.github.io/tessdoc/Data-Files.html) files which support the legacy engine, for example those from the [tessdata](https://github.com/tesseract-ocr/tessdata) repository.
Stefan Weil is the current lead developer. Ray Smith was the lead developer until 2018. The maintainer is Zdenko Podobny. For a list of contributors see [AUTHORS](https://github.com/tesseract-ocr/tesseract/blob/main/AUTHORS)
and GitHub's log of [contributors](https://github.com/tesseract-ocr/tesseract/graphs/contributors).
Tesseract has **unicode (UTF-8) support**, and can **recognize [more than 100 languages](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html)** "out of the box".
Tesseract supports **[various image formats](https://tesseract-ocr.github.io/tessdoc/InputFormats)** including PNG, JPEG and TIFF.
Tesseract supports **various output formats**: plain text, hOCR (HTML), PDF, invisible-text-only PDF, TSV and ALTO (the last one - since version 4.1.0).
You should note that in many cases, in order to get better OCR results, you'll need to **[improve the quality](https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html) of the image** you are giving Tesseract.
This project **does not include a GUI application**. If you need one, please see the [3rdParty](https://tesseract-ocr.github.io/tessdoc/User-Projects-%E2%80%93-3rdParty.html) documentation.
Tesseract **can be trained to recognize other languages**.
See [Tesseract Training](https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html) for more information.
## Brief history
Tesseract was originally developed at Hewlett-Packard Laboratories Bristol UK and at Hewlett-Packard Co, Greeley Colorado USA between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. In 2005 Tesseract was open sourced by HP. From 2006 until November 2018 it was developed by Google.
Major version 5 is the current stable version and started with release
[5.0.0](https://github.com/tesseract-ocr/tesseract/releases/tag/5.0.0) on November 30, 2021. Newer minor versions and bugfix versions are available from
[GitHub](https://github.com/tesseract-ocr/tesseract/releases/).
Latest source code is available from [main branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/main).
Open issues can be found in [issue tracker](https://github.com/tesseract-ocr/tesseract/issues),
and [planning documentation](https://tesseract-ocr.github.io/tessdoc/Planning.html).
See **[Release Notes](https://tesseract-ocr.github.io/tessdoc/ReleaseNotes.html)**
and **[Change Log](https://github.com/tesseract-ocr/tesseract/blob/main/ChangeLog)** for more details of the releases.
## Installing Tesseract
You can either [Install Tesseract via pre-built binary package](https://tesseract-ocr.github.io/tessdoc/Installation.html)
or [build it from source](https://tesseract-ocr.github.io/tessdoc/Compiling.html).
A C++ compiler with good C++17 support is required for building Tesseract from source.
## Running Tesseract
Basic **[command line usage](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html)**:
tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]
For more information about the various command line options use `tesseract --help` or `man tesseract`.
Examples can be found in the [documentation](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html#simplest-invocation-to-ocr-an-image).
## For developers
Developers can use `libtesseract` [C](https://github.com/tesseract-ocr/tesseract/blob/main/include/tesseract/capi.h) or
[C++](https://github.com/tesseract-ocr/tesseract/blob/main/include/tesseract/baseapi.h) API to build their own application. If you need bindings to `libtesseract` for other programming languages, please see the
[wrapper](https://tesseract-ocr.github.io/tessdoc/AddOns.html#tesseract-wrappers) section in the AddOns documentation.
Documentation of Tesseract generated from source code by doxygen can be found on [tesseract-ocr.github.io](https://tesseract-ocr.github.io/).
## Support
Before you submit an issue, please review **[the guidelines for this repository](https://github.com/tesseract-ocr/tesseract/blob/main/CONTRIBUTING.md)**.
For support, first read the [documentation](https://tesseract-ocr.github.io/tessdoc/),
particularly the [FAQ](https://tesseract-ocr.github.io/tessdoc/FAQ.html) to see if your problem is addressed there.
If not, search the [Tesseract user forum](https://groups.google.com/g/tesseract-ocr), the [Tesseract developer forum](https://groups.google.com/g/tesseract-dev) and [past issues](https://github.com/tesseract-ocr/tesseract/issues), and if you still can't find what you need, ask for support in the mailing-lists.
Mailing-lists:
* [tesseract-ocr](https://groups.google.com/g/tesseract-ocr) - For tesseract users.
* [tesseract-dev](https://groups.google.com/g/tesseract-dev) - For tesseract developers.
Please report an issue only for a **bug**, not for asking questions.
## License
The code in this repository is licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
**NOTE**: This software depends on other packages that may be licensed under different open source licenses.
Tesseract uses [Leptonica library](http://leptonica.com/) which essentially
uses a [BSD 2-clause license](http://leptonica.com/about-the-license.html).
## Dependencies
Tesseract uses [Leptonica library](https://github.com/DanBloomberg/leptonica)
for openin
没有合适的资源?快使用搜索试试~ 我知道了~
使用python+tesseract-ocr搭建的离线版OCR识别
共1743个文件
qml:695个
png:222个
dll:211个
需积分: 0 4 下载量 133 浏览量
2023-08-03
14:33:46
上传
评论 1
收藏 138.54MB ZIP 举报
温馨提示
现有ocr识别小工具,分为两类,一类是依靠网络公司提供的api接口来进行识别,如百度文字识别等,优点是识别准确率高,缺点是没网没授权不能用。另一类就是本地化这种了,现在网上公开的ocr识别工具tesseract-ocr提供中文识别包。再加上使用qq邮箱的截图工具,组建成离线版的OCR识别工具。操作简单,由于使用的是最基本的训练库,准确率比较低,目前能够准确识别pdf中标准文字,带图标的文字可能会识别错误。优化的建议自己训练中文识别库替换tesseract-ocr文件夹内容即可。
资源推荐
资源详情
资源评论
收起资源包目录
使用python+tesseract-ocr搭建的离线版OCR识别 (1743个子文件)
alto 23B
api_config 26B
AUTHORS 787B
batch 49B
bigram 129B
digits 37B
libtesseract-5.dll 94.49MB
libicudt72.dll 29.82MB
opengl32sw.dll 19.95MB
Qt5Gui.dll 6.68MB
Qt5Core.dll 5.74MB
Qt5Widgets.dll 5.24MB
libcrypto-3-x64.dll 4.68MB
Qt5Designer.dll 4.28MB
python38.dll 3.99MB
d3dcompiler_47.dll 3.98MB
Qt5Quick.dll 3.96MB
Qt5Qml.dll 3.43MB
libglesv2.dll 3.23MB
libcrypto-1_1.dll 3.23MB
libcrypto-1_1-x64.dll 3.06MB
libicuin72.dll 2.78MB
libleptonica-6.dll 2.57MB
Qt5XmlPatterns.dll 2.52MB
libstdc++-6.dll 1.93MB
libunistring-5.dll 1.9MB
libeay32.dll 1.9MB
libicuuc72.dll 1.72MB
libgio-2.0-0.dll 1.72MB
Qt5Location.dll 1.57MB
qtquickcontrols2imaginestyleplugin.dll 1.54MB
qwindows.dll 1.41MB
libglib-2.0-0.dll 1.37MB
qsqlite.dll 1.35MB
libarchive-13.dll 1.33MB
Qt5Network.dll 1.28MB
Qt5Quick3DRuntimeRender.dll 1.19MB
libcairo-2.dll 1.18MB
libharfbuzz-0.dll 1.14MB
libzstd.dll 1.08MB
libiconv-2.dll 1.07MB
Qt5QuickTemplates2.dll 1.06MB
qminimal.dll 825KB
qtquickextrasflatplugin.dll 810KB
libwebp-7.dll 758KB
libfreetype-6.dll 757KB
libLerc.dll 743KB
libjpeg-8.dll 743KB
qoffscreen.dll 737KB
qtquickcontrols2materialstyleplugin.dll 729KB
Qt5Multimedia.dll 729KB
libpixman-1-0.dll 677KB
libssl-1_1.dll 670KB
libssl-1_1-x64.dll 666KB
libcurl-4.dll 659KB
qtquickcontrols2plugin.dll 631KB
qtquickcontrols2fusionstyleplugin.dll 598KB
qtquickcontrols2universalstyleplugin.dll 592KB
MSVCP140.dll 576KB
libtiff-6.dll 545KB
Qt5Bluetooth.dll 535KB
Qt5Quick3D.dll 505KB
qwebp.dll 498KB
qwebgl.dll 471KB
libssh2-1.dll 471KB
Qt5QuickParticles.dll 467KB
Qt5RemoteObjects.dll 466KB
libopenjp2-7.dll 446KB
Qt5QmlModels.dll 428KB
Qt5DBus.dll 426KB
Qt5Help.dll 418KB
qjpeg.dll 411KB
txgymailcamera.dll 384KB
libpcre2-8-0.dll 384KB
qtiff.dll 381KB
libpango-1.0-0.dll 380KB
qtgeoservices_mapbox.dll 354KB
ssleay32.dll 354KB
qtquicktemplates2plugin.dll 345KB
libgobject-2.0-0.dll 339KB
qtquickcontrolsplugin.dll 330KB
Qt5Svg.dll 323KB
libfontconfig-1.dll 315KB
Qt5OpenGL.dll 313KB
Qt5PrintSupport.dll 310KB
Qt5Positioning.dll 308KB
qtgeoservices_nokia.dll 294KB
dsengine.dll 294KB
declarative_multimedia.dll 270KB
libpng16-16.dll 242KB
Qt5Test.dll 239KB
libidn2-0.dll 235KB
Qt5WinExtras.dll 231KB
qtlabsplatformplugin.dll 228KB
Qt5Quick3DRender.dll 220KB
Qt5QuickShapes.dll 210KB
Qt5Xml.dll 208KB
Qt5Sql.dll 204KB
wmfengine.dll 203KB
declarative_sensors.dll 201KB
共 1743 条
- 1
- 2
- 3
- 4
- 5
- 6
- 18
资源评论
Zk9509
- 粉丝: 731
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功