
Pattern Recognition 79 (2018) 290–302
Contents lists available at ScienceDirect
Pattern Recognition
journal homepage: www.elsevier.com/locate/patcog
A deeply supervised residual network for HEp-2 cell classification via
cross-modal transfer learning
Haijun Lei
a
, Tao Han
a
, Feng Zhou
b
, Zhen Yu
c
, Jing Qin
c , d
, Ahmed Elazab
c , e
, Baiying Lei
c , ∗
a
Guangdong Province Key Laboratory of Popular High-performance Computers, School of Computer and Software Engineering, Shenzhen University,
Shenzhen 518060, China
b
Industrial and Manufacturing Systems Engineering, University of Michigan at Dearborn, 4901 Evergreen Road, Dearborn, MI 48128, United States
c
School of Biomedical Engineering, Health Science Center, Shenzhen University, National-Regional Key Technology Engineering Laboratory for Medical
Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, Nanhai Ave 3688, Shenzhen, Guangdong 518060, China
d
Centre for Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hong Kong
e
Computer Science Department, Misr Higher Institute for Commerce and Computers, Mansoura 35516, Egypt
a r t i c l e i n f o
Article history:
Received 1 October 2017
Revised 4 January 2018
Accepted 10 February 2018
Available online 15 February 2018
Keywords:
HEp-2 cell classification
Residual network
Deeply supervised ResNet
Cross-modal transfer learning
a b s t r a c t
Accurate Human Epithelial-2 (HEp-2) cell image classification plays an important role in the diagnosis
of many autoimmune diseases and subsequent treatment. One of the key challenges is huge intra-class
variations caused by inhomogeneous illumination. To address it, we propose a framework based on very
deep supervised residual network (DSRN) to classify HEp-2 cell images. Specifically, we adopt a resid-
ual network of 50 layers (ResNet-50) that is substantially deep to extract rich and discriminative fea-
tures. The deep supervision is imposed on the ResNet-based framework to further boost the classification
performance by directly guiding the training of the lower and upper levels of the network. The pro-
posed method is evaluated using two publicly available datasets (i.e., International Conference on Pattern
Recognition (ICPR) 2012 and ICPR2016-Task1 cell classification contest datasets). Different from the pre-
vious deep learning models learned from scratch, a cross-modal transfer learning strategy is developed.
Namely, we pretrain ICPR2012 dataset to fine-tune ICPR2016 dataset based on our DSRN model since
both datasets are similar. Extensive experiments show that the proposed method delivers state-of-the-
art performance and outperforms the traditional methods based on deep convolutional neural network
(DCNN).
©2018 Elsevier Ltd. All rights reserved.
1. Introduction
Indirect immunofluorescence detection technology is mainly
used for the analysis of anti-nuclear antibody (ANA) [1–4] , autoim-
mune disease diagnosis and treatment. For the ANA analysis, the
HEp-2 cell is able to divide and produce a large number of anti-
gens. Human experts usually employ a fluorescence microscope to
discriminate the nuclear antibody via visual inspection. As shown
in Fig. 1 , the inhomogeneous illumination leads to huge intra-
class variations, which make the HEp-2 cell classification chal-
lenging. In order to address such challenge, traditional methods
mainly involve three steps: feature extraction, feature encoding,
and classification [5–26] . However, these methods mainly focus
on hand-crafted features, which suffer from limited classification
∗
Corresponding author.
E-mail addresses: lhj@szu.edu.cn (H. Lei), 2151230219@email.szu.edu.cn (T. Han),
fezhou@umich.edu (F. Zhou), yishon@email.unc.edu (Z. Yu), harry.qin@polyu.edu.hk
(J. Qin), leiby@szu.edu.cn (B. Lei).
performance. To further improve the feature representation capa-
bilities for classification enhancement, it is important to effectively
extract more discriminative and informative features.
In this respect, deep convolutional neural network (DCNN)
has attracted considerable interests. The DCNN has strong fea-
ture representation capabilities by end-to-end learning using var-
ious network layers. Due to the availability of large-scale anno-
tated datasets (e.g. ImageNet) [27] and their powerful representa-
tion capabilities, DCNN is able to improve the classification per-
formance significantly [28,29] . For example, Phan et al. [30] pro-
posed a classification system to extract features using a pre-trained
DCNN model based on AlexNet architecture. Multi-class support
vector machine (SVM) is used as a classifier to evaluate Interna-
tional Conference on Pattern Recognition (ICPR) 2012 dataset. Gao
et al. [31] proposed a DCNN model to recognize HEp-2 cells using
the classical LeNet-5 [32] . Bayramoglu et al. [17] proposed a pre-
training strategy to fine-tune the CNN network. However, features
learned automatically by traditional DCNN algorithms are still lim-
ited [33] . Besides, some features in the early layers demonstrate
https://doi.org/10.1016/j.patcog.2018.02.006
0031-3203/© 2018 Elsevier Ltd. All rights reserved.