56 IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, VOL. 1, NO. 1, MARCH 2015
FPGA-Based Parallel Hardware Architecture
for Real-Time Image Classification
Murad Qasaimeh, Assim Sagahyroon, and Tamer Shanableh
Abstract—This paper proposes a parallel hardware architecture
for real-time image classification based on scale-invariant fea-
ture transform (SIFT), bag of features (BoFs), and support vector
machine (SVM) algorithms. The proposed architecture exploits
different forms of parallelism in these algorithms in order to accel-
erate their execution to achieve real-time performance. Different
techniques have been used to parallelize the execution and reduce
the hardware resource utilization of the computationally intensive
steps in these algorithms. The architecture takes a 640 × 480 pixel
image as an input and classifies it based on its content within
33 ms. A prototype of the proposed architecture is implemented on
an FPGA platform and evaluated using two benchmark datasets:
1) Caltech-256 and 2) the Belgium Traffic Sign datasets. The archi-
tecture is able to detect up to 1270 SIFT features per frame with
an increment of 380 extra features from t he best recent implemen-
tation. We were able to speedup the feature extraction algorithm
when compared to an equivalent software implementation by 54×
and for classification algorithm by 6×, while maintaining the
difference in classification accuracy within 3%. The hardware
resources utilized by our architecture were also less than those
used by other existing solutions.
Index Terms—Field-programmable gate array (FPGA), hard-
ware implementation, image classification, scale-invariant feature
transform (SIFT).
I. INTRODUCTION
O
BJECT detection and classification is the process of find-
ing objects of a certain class, such as faces, cars, and
buildings, in an image or a video frame. This task involves clas-
sifying the input image according to its visual content into a
general class of similar objects. Feature-based object classifi-
cation is a common image classification method in computer
vision. It typically uses one of the feature extraction algorithms
to extract the image’s important content. Then, it uses one of the
classification algorithms to label images into one of the trained
categories. Image classification has many potential applications
including autonomous robots, intelligent traffic systems, com-
puter human interactions, quality control in production lines,
and biomedical image analysis.
Many algorithms have been proposed for feature extraction
and classification in the last two decades. These algorithms
Manuscript received November 14, 2014; February 26, 2015; accepted
March 30, 2015. Date of publication April 23, 2015; date of current version
June 26, 2015. The associate editor coordinating the review of this manuscript
and approving it for publication was Prof. Andrew Lumsdaine.
The authors are with the Department of Computer Engineering, American
University of Sharjah (AUS), Sharjah 26666, United Arab Emirates (e-mail:
murad.m.q@gmail.com; asagahyroon@aus.edu; tshanableh@aus.edu).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCI.2015.2424077
involve a tradeoff between the quality of the extracted features,
the classification accuracy and the computational complexity
of these algorithms. As an example, the unsophisticated FAST
corner detector [1] executes in 13 ms for a 1024 × 768 image
on a desktop machine, but the extracted features are not robust
to changes in illumination, scale, or rotation. In contrast, a
software implementation of computationally expensive algo-
rithms like SIFT [2] and HOG [3], process the same image
within 1920 ms. However, the extracted features are invariant
to change in illumination, scale, viewpoint and rotation. The
argument also applies to classification algorithms. A simple
classification algorithm like Naive Bayes takes noticeably lower
time to execute in comparison to a sophisticated algorithms
like RBF-SVM. However, the classification accuracy achieved
by SVM is much higher than that of Naive Bayes. So, imple-
menting robust image classification s ystem using complex algo-
rithms like SIFT [2] and SVM [4] are computationally intensive
and it is very hard to reach real-time performance (33 ms).
To address the complexity problem, a number of simplified
versions of the SIFT algorithm have been proposed, such as
SURF [5] and the fast SIFT [6] algorithms. The SURF reduces the
complexityoftheSIFTalgorithmbyapproximatingtheLaplacian
of Gaussian (LoG) using filters in the orientation assignment and
feature description steps. Thisreducesthecomputationtimefrom
1036 ms for SIFT to 354 ms for SURF on a standard Linux PC
(Pentium IV, 3 GHz) [5]. ManySURF hardwareimplementations
have been proposed to accelerate the algorithm [7]–[9]. Even
thought, it reduces the computational time for extracting the local
features, it produces fewer reliable features in comparison with
the SIFT algorithm. Moreover, the hardware implementations
require a considerable internal memory, almost four times the
memory requirement of SIFT [10], [11].
Other researchers tried to accelerate the SIFT algorithm
using a GPU platform [12]–[14]. The implementation in [12]
achieved a speedup of 4 −7× over an optimized CPU ver-
sion when tested on a dataset of 320 ×280 pixel images. The
power consumption of this implementation on the NVIDIA
Tegra 250 development board was 3383 mW. In [14], another
GPU-based implementation was proposed. It was able to detect
SIFT features in images (640 × 480 pixels) within 58 ms,
which is around 20 frames/s. The GPU-based implementa-
tions can accelerate the SIFT algorithm to reach near real-time
performance, but they require an excessive amount of hard-
ware resources and they consume too much power compared
to other hardware platforms. This makes the GPU implementa-
tions not suitable for portable embedded systems with limited
power. Other solutions tried to accelerate the SIFT algorithm
2333-9403 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.