International Journal of Computer Applications (0975 – 8887)
Volume 60– No.1, December 2012
8
Identifying the Character by Applying PCA
Method using Matlab
P.Subbuthai Azha Periasamy S.Muruganand
Department of Electronics and
Instrumentation
Department of Electronics and
Instrumentation
Department of Electronics and
Instrumentation
Bharathiar University Bharathiar University Bharathiar University
Coimbatore Coimbatore Coimbatore
India India India
ABSTRACT
Optical character recognition is getting more and more useful
in daily life for various purposes. The aim of the paper is to
find the number and English alphabets in the symbol of times
new roman, arial, arial block size of 72, 48.Many researches
have been done on many types of characters by using different
approaches. In this recognition system was implemented by
using of principal component analysis (PCA) algorithm. This
algorithm is based on an Eigen value and Euclidean distance.
PCA is practical and standard statistical tool in modern data
analysis that has found application
in different areas such as
face recognition, image compression, and neuroscience.
General Terms
Binary. Edge, filling image and PCA
Keywords
PCA, Eigen value, Euclidean distance
1. INTRODUCTION
Character recognition system has received considerable
attention in recent years due to the tremendous need for
digitization of printed documents. Manual assignment of text
data from images is time consumption and costly. For this
then the automation of text extracted from images is one of
the challenging area in the image processing. In this paper,
numbers and English alphabet has to be recognized. The
English letter can be consists of two cases, that are uppercase
and lower case. In this paper focused on uppercase character
in the style of Times new roman, Arial, Arial block of the
size of 72, 48. English language is used all over the world
for the communication purpose, also in many Indian offices
such as railways, passport, income tax, sales tax, defense and
public sector undertakings such as bank, insurance, court,
economic centers, and educational institutions etc these
approaches are done by principal component analysis. PCA
is a linear transformation, which rotates the axes of image
space along lines of maximum variance. The rotation is
based on the orthogonal eigenvectors of the covariance
matrix generated from a sample of image data from the input
channels. The output from this transformation is a new set of
images channels, which are also referred to as eigenchannels.
The main use of this to reduce the dimensionality of a data
set while retaining as much information as possible. The
process of character recognition process can be divided into
following stages namely preprocessing, feature extraction
and recognition.
2. PREPROCESSING STEPS
Fig 1: Block Diagram for Preprocessing
Preprocessing operations generally fall into three categories:
image acquisition, image conversion, morphological
operation. Their respective blocks are shown in Figure 1.
Digital image acquisition is the creation of digital images.
Typically from a physical scene. The term is often assumed to
imply or include the preprocessing, compression, storage
printing and display of such images. An image conversion
consists of three steps: RGB image to gray image and gray
image to binary image and then finally binary image into an
edge image. The first step of the image processing is
binarization. The colorful image represented by 3 coefficients
red, green and blue from the acquisition unit must be
converted to the images with 256 levels of gray scale[1]. Then
select an appropriate threshold to achieve the image
binarization[2]. Following by converting the grayscale image
into binary image which consists of only 0 and 1[3]. Then the
gray image is converted into edge image. Dilation image and
filling images are took place in morphological operation.
Edges of images are detected using appropriate thresholding
and then further dilated operation using appropriate structure
element[4]. The dilated images are converted into filling
image through binary image. The filling image is used to
reduce the number of connected components and the
command bwlabel is used to calculate the connected
component[3]. The next step is to obtain the bounding box of
character. Bounding box is referring to the minimum
rectangular box that is able to encapsulate the whole
character[5]. Single character has been detected from this
bounding box of the character. For template matching the
image is resized into 74*50 by using bilinear method.
3. PRINCIPAL COMPONENT ANALYSIS
(PCA)
PCA is mathematically defined as an orthogonal linear
transformation that transforms the data to a new coordinate
system such that the greatest variance by any projection of the
data comes to lie on the first coordinate, the second greatest