manual segmentation approaches, there is a significant demand
for computer algorithms that can do segmentation quickly and
accurately without human interaction. However, there are some
limitations of medical image segmentation including data
scarcity and class imbalance. Most of the time the large number
of labels (often in the thousands) for training is not available for
several reasons [11]. Labeling the dataset requires an expert in
this field which is expensive, and it requires a lot of effort and
time. Sometimes, different data transformation or augmentation
techniques (data whitening, rotation, translation, and scaling)
are applied for increasing the number of labeled samples
available [12, 13, and 14]. In addition, patch based approaches
are used for solving class imbalance problems. In this work, we
have evaluated the proposed approaches on both patch-based
and entire image-based approaches. However, to switch from
the patch-based approach to the pixel-based approach that
works with the entire image, we must be aware of the class
imbalance problem. In the case of semantic segmentation, the
image backgrounds are assigned a label and the foreground
regions are assigned a target class. Therefore, the class
imbalance problem is resolved without any trouble. Two
advanced techniques including cross-entropy loss and dice
similarity are introduced for efficient training of classification
and segmentation tasks in [13, 14].
Furthermore, in medical image processing, global
localization and context modulation is very often applied for
localization tasks. Each pixel is assigned a class label with a
desired boundary that is related to the contour of the target
lesion in identification tasks. To define these target lesion
boundaries, we must emphasize the related pixels. Landmark
detection in medical imaging [15, 16] is one example of this.
There were several traditional machine learning and image
processing techniques available for medical image
segmentation tasks before the DL revolution, including
amplitude segmentation based on histogram features [17], the
region based segmentation method [18], and the graph-cut
approach [19]. However, semantic segmentation approaches
that utilize DL have become very popular in recent years in the
field of medical image segmentation, lesion detection, and
localization [20]. In addition, DL based approaches are known
as universal learning approaches, where a single model can be
utilized efficiently in different modalities of medical imaging
such as MRI, CT, and X-ray.
According to a recent survey, DL approaches are applied to
almost all modalities of medical imagining [20, 21].
Furthermore, the highest number of papers have been published
on segmentation tasks in different modalities of medical
imaging [20, 21]. A DCNN based brain tumor segmentation and
detection method was proposed in [22].
From an architectural point of view, the CNN model for
classification tasks requires an encoding unit and provides class
probability as an output. In classification tasks, we have
performed convolution operations with activation functions
followed by sub-sampling layers which reduces the
dimensionality of the feature maps. As the input samples
traverse through the layers of the network, the number of
feature maps increases but the dimensionality of the feature
maps decreases. This is shown in the first part of the model (in
green) in Fig. 2. Since, the number of feature maps increase in
the deeper layers, the number of network parameters increases
respectively. Eventually, the Softmax operations are applied at
the end of the network to compute the probability of the target
classes.
As opposed to classification tasks, the architecture of
segmentation tasks requires both convolutional encoding and
decoding units. The encoding unit is used to encode input
images into a larger number of maps with lower dimensionality.
The decoding unit is used to perform up-convolution (de-
convolution) operations to produce segmentation maps with the
same dimensionality as the original input image. Therefore, the
architecture for segmentation tasks generally requires almost
double the number of network parameters when compared to
the architecture of the classification tasks. Thus, it is important
to design efficient DCNN architectures for segmentation tasks
which can ensure better performance with less number of
network parameters.
This research demonstrates two modified and improved
segmentation models, one using recurrent convolution
networks, and another using recurrent residual convolutional
networks. To accomplish our goals, the proposed models are