Opencv|Document Scanning & Optical Character Recognition
Opencv|Document Scanning & Optical Character Recognition(OCR)
Step 1. Import some packages and a pyfile named resize for the project.
import cv2
import numpy as np
import resize
Step 2. Import and preliminary processing of the image.
Read in the picture to be detected. If the resolution is good enough, we can also use the laptop camera.
image = cv2.imread('test.jpg')
image = cv2.resize(image, (1500, 1125))
orig = image.copy()
# Create a copy of the original image.
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
# Grayscale the image, and then perform line Gaussian blur to reduce noise
edged = cv2.Canny(blurred, 0, 50)
# Use canny algorithm for edge detection
orig_edged = edged.copy()
# Create a copy processed by the canny algorithm.
Step 3. Get approximate contours of the image.
Find the outline in the edge image, keep only the largest one, and initialize the screen outline.
contours, hierarchy = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)
# findContours() for finding contours from binary images
contours = sorted(contours, key=cv2.contourArea, reverse=True)
# Use the sorted function in python to return the results of contours
# Get approximate contours:
for c in contours:
p = cv2.arcLength(c, True)
# Calculate the circumference of the closed contour or the length of the curve
approx = cv2.approxPolyDP(c, 0.02 * p, True)
# Specify (0.02 * p) as precision to approximate the polygon curve. Because approximate curve is a closed curve, the parameter closed is True.
if len(approx) == 4:
target = approx
break
#Find the rectangle profile we are looking for.
Step 4. Create a function to rectify and resize the target image.
ps: Function rectify is stored in resize.py.
def rectify(h):
h = h.reshape((4, 2))
hnew = np.zeros((4, 2), dtype=np.float32)
add = h.sum(1)
hnew[0] = h[np.argmin(add)] # return the larger number
hnew[2] = h[np.argmax(add)] diff = np.diff(h, axis=1)
# Calculate the N-dimensional discrete difference along the specified axis.
hnew[1] = h[np.argmin(diff)] hnew[3] = h[np.argmax(diff)] # Determine the four vertices of the detected document.
return hnew
approx = resize.rectify(target)
Step 5. Map our target to a quadrilateral size of (400 * 600) after perspective transformation.
pts2 = np.float32([[0, 0], [400, 0], [400, 600], [0, 600]])
M = cv2.getPerspectiveTransform(approx, pts2)
#Use the gtePerspectiveTransform function to obtain the perspective transformation matrix.
#(approx is the four fixed-point collection positions of the quadrilateral in the source image; pts2 is the four fixed-point collection positions of the
target image.)
dst = cv2.warpPerspective(orig, M, (400,600))
# Use the warpPerspective function to perform perspective transformation on the source image, the output image dst size is 400 * 600.
Step 6. Use several different ways to optimize the perspective transformed image to obtain the final result.
We can also compare different ways of processing below to choose the properest one to be our final results. The results of image processing
are not shown in the article. If you are interested in it, just try it by yourself.
dst = cv2.cvtColor(dst, cv2.COLOR_BGR2GRAY)
# Grayscale the image after perspective transformation
cv2.drawContours(image, [target], -1, (0, 255, 0), 2)
# Draw the outline, -1 means all the outlines, the color of the brush is green, and the thickness is 2.