Opencv|DocumentScanning&OpticalCharacterRecognition资源-CSDN文库

29 浏览量 2020-12-21 12:13:43 上传评论收藏 580KB PDF 举报

Opencv|Document Scanning & Optical Character Recognition(OCR) Step 1. Import some packages and a pyfile named resize for the project. import cv2 import numpy as np import resize Step 2. Import and preliminary processing of the image. Read in the picture to be detected. If the resolution is good e 在OpenCV库中，光学字符识别（OCR）与文档扫描是两个重要的计算机视觉应用，用于自动识别和提取图像中的文本信息。以下是对标题和描述中所提及知识点的详细解释： **1. OpenCV库** OpenCV（开源计算机视觉库）是一个广泛使用的跨平台计算机视觉库，包含众多图像处理和计算机视觉的函数。它支持多种编程语言，如Python、C++等，为图像和视频的处理提供了强大的工具。 **2. 图像预处理** 在进行OCR之前，通常需要对图像进行预处理以提高识别效果。这里涉及到的预处理步骤包括： - **读取图像**：`cv2.imread()`用于读取图像，可以指定路径。 - **调整图像大小**：`cv2.resize()`用于调整图像尺寸，这里将图像缩放至(1500, 1125)像素。 - **灰度化**：`cv2.cvtColor()`函数将彩色图像转换为灰度图像，`cv2.COLOR_BGR2GRAY`参数表示从BGR色彩空间转为灰度。 - **高斯模糊**：`cv2.GaussianBlur()`对图像进行高斯滤波，以消除噪声。 - **边缘检测**：`cv2.Canny()`算法用于边缘检测，找出图像中可能存在的文本轮廓。 **3. 图像轮廓检测** - **寻找轮廓**：`cv2.findContours()`函数用于查找图像中的轮廓，`cv2.RETR_LIST`表示检索所有轮廓，`cv2.CHAIN_APPROX_NONE`保留所有轮廓点。 - **排序轮廓**：通过`sorted()`函数按轮廓面积降序排列，选择最大的轮廓，这通常是文档的主要轮廓。 - **近似轮廓**：使用`cv2.arcLength()`计算轮廓周长，`cv2.approxPolyDP()`进行曲线近似，减少点的数量，以降低计算复杂性。 **4. 目标图像的矩形轮廓匹配** 找到具有四个顶点的轮廓，这通常代表了文档的边界框。`len(approx) == 4`确保选取的是矩形形状。 **5. 图像校正和调整** - `rectify()`函数：在`resize.py`文件中定义，用于校正和调整目标图像的大小和方向，使其更适合进行OCR识别。这个函数可能涉及到透视变换，以使图像变为正交视图。 **6. 光学字符识别（OCR）** - OCR是识别图像中文字的过程，OpenCV本身并不包含完整的OCR引擎，但可以与其他库如Tesseract结合使用。Tesseract是一个开源OCR引擎，能识别多种语言的文本，可以与OpenCV配合进行更复杂的图像处理和文本提取。在实际应用中，完成上述步骤后，可以利用Tesseract进行OCR识别，首先需要安装Tesseract，然后调用其API来识别处理过的图像中的文本。识别后的文本可以进一步处理，如存储、搜索或分析。 OpenCV在文档扫描和OCR过程中扮演着图像预处理和基本形状识别的角色，而OCR的具体实现则通常依赖于其他专门的OCR引擎。通过精确的图像处理和有效的OCR，可以实现高效且准确的文本自动提取。

资源推荐

资源详情

资源评论

Opencv|Document Scanning & Optical Character Recognition

Opencv|Document Scanning & Optical Character Recognition(OCR)

Step 1. Import some packages and a pyfile named resize for the project.

import cv2

import numpy as np

import resize

Step 2. Import and preliminary processing of the image.

Read in the picture to be detected. If the resolution is good enough, we can also use the laptop camera.

image = cv2.imread('test.jpg')

image = cv2.resize(image, (1500, 1125))

orig = image.copy()

# Create a copy of the original image.

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

blurred = cv2.GaussianBlur(gray, (5, 5), 0)

# Grayscale the image, and then perform line Gaussian blur to reduce noise

edged = cv2.Canny(blurred, 0, 50)

# Use canny algorithm for edge detection

orig_edged = edged.copy()

# Create a copy processed by the canny algorithm.

Step 3. Get approximate contours of the image.

Find the outline in the edge image, keep only the largest one, and initialize the screen outline.

contours, hierarchy = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)

# findContours() for finding contours from binary images

contours = sorted(contours, key=cv2.contourArea, reverse=True)

# Use the sorted function in python to return the results of contours

# Get approximate contours:

for c in contours:

p = cv2.arcLength(c, True)

# Calculate the circumference of the closed contour or the length of the curve

approx = cv2.approxPolyDP(c, 0.02 * p, True)

# Specify (0.02 * p) as precision to approximate the polygon curve. Because approximate curve is a closed curve, the parameter closed is True.

if len(approx) == 4:

target = approx

break

#Find the rectangle profile we are looking for.

Step 4. Create a function to rectify and resize the target image.

ps: Function rectify is stored in resize.py.

def rectify(h):

h = h.reshape((4, 2))

hnew = np.zeros((4, 2), dtype=np.float32)

add = h.sum(1)

hnew[0] = h[np.argmin(add)] # return the larger number

hnew[2] = h[np.argmax(add)] diff = np.diff(h, axis=1)

# Calculate the N-dimensional discrete difference along the specified axis.

hnew[1] = h[np.argmin(diff)] hnew[3] = h[np.argmax(diff)] # Determine the four vertices of the detected document.

return hnew

approx = resize.rectify(target)

Step 5. Map our target to a quadrilateral size of (400 * 600) after perspective transformation.

pts2 = np.float32([[0, 0], [400, 0], [400, 600], [0, 600]])

M = cv2.getPerspectiveTransform(approx, pts2)

#Use the gtePerspectiveTransform function to obtain the perspective transformation matrix.

#(approx is the four fixed-point collection positions of the quadrilateral in the source image; pts2 is the four fixed-point collection positions of the

target image.)

dst = cv2.warpPerspective(orig, M, (400,600))

# Use the warpPerspective function to perform perspective transformation on the source image, the output image dst size is 400 * 600.

Step 6. Use several different ways to optimize the perspective transformed image to obtain the final result.

We can also compare different ways of processing below to choose the properest one to be our final results. The results of image processing

are not shown in the article. If you are interested in it, just try it by yourself.

dst = cv2.cvtColor(dst, cv2.COLOR_BGR2GRAY)

# Grayscale the image after perspective transformation

cv2.drawContours(image, [target], -1, (0, 255, 0), 2)

# Draw the outline, -1 means all the outlines, the color of the brush is green, and the thickness is 2.

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余5页未读，立即下载