关于图像配准和深度学习的一篇文章

5星 · 超过95%的资源需积分: 42 125 浏览量 2018-07-22 15:38:46 上传评论 13 收藏 5.6MB PDF 举报

### 关于图像配准与深度学习的综合应用 #### 一、引言在现代遥感技术领域，图像配准(Image Registration)是一项至关重要的技术，它涉及到将不同时间、不同传感器或不同视角下获取的同一场景的图像进行几何对齐的过程。随着深度学习技术的发展，其在图像配准领域的应用也越来越广泛，极大地提高了图像配准的精度和效率。 #### 二、深度神经网络在遥感图像配准中的应用 ##### 2.1 深度神经网络框架本研究提出了一种针对遥感图像配准问题的有效深度神经网络框架。该框架的主要特点是采用端到端(end-to-end)的学习方式，即通过直接学习图像块对(patch pairs)与其匹配标签之间的映射关系来进行图像配准，而传统的方法通常是分别进行特征提取和特征匹配。这种设计使得整个处理过程(即学习映射函数)可以通过训练网络时的信息反馈来优化，这是传统方法所缺乏的。 ##### 2.2 自学习(Self-Learning) 为了缓解遥感图像数据量小的问题，该研究引入了自学习机制。具体而言，通过使用原始图像及其变换后的副本进行学习，可以在有限的数据集上提高模型的泛化能力。这种方法不仅能够利用有限的数据集进行有效的训练，还能够在一定程度上解决过拟合问题。 ##### 2.3 转移学习(Transfer Learning) 为了减少深度神经网络训练阶段的巨大计算成本，研究团队还采用了转移学习的方法。通过在预训练模型的基础上进行微调，可以大大加快模型训练的速度，并进一步提高性能。这种方法不仅可以显著加速整个框架的工作流程，还能带来额外的性能提升。 #### 三、实验结果通过对来自RadarSat、SPOT和Landsat等多个不同来源的七组遥感图像进行综合实验，结果显示该框架能够显著提高图像配准的准确性，最高可达到2.4%至53.7%的提升。这些结果充分证明了深度学习技术在遥感图像配准领域具有巨大的潜力和优势。 #### 四、结论深度神经网络在遥感图像配准方面的应用展现出了前所未有的潜力。通过端到端的学习方式、自学习以及转移学习等策略的应用，不仅能够有效提高配准的准确性和效率，还能在数据量有限的情况下实现良好的模型泛化能力。未来的研究方向将进一步探索如何更好地结合多种深度学习技术和遥感图像的特点，以开发出更加高效、精确的图像配准解决方案。

资源推荐

资源详情

资源评论

A deep learning framework for remote sensing image registration

Shuang Wang

, Dou Quan

, Xuefeng Liang

⇑

, Mengdan Ning

, Yanhe Guo

, Licheng Jiao

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation,

Xidian University, Xi’an, Shaanxi Province 710071, China

IST, Graduate School of Informatics, Kyoto University, Kyoto, Japan

article info

Article history:

Received 30 May 2017

Received in revised form 1 December 2017

Accepted 26 December 2017

Available online xxxx

Keywords:

Deep neural network

Image registration

Remote sensing image

Self-learning

Transfer learning

abstract

We propose an effective deep neural network aiming at remote sensing image registration problem.

Unlike conventional methods doing feature extraction and feature matching separately, we pair patches

from sensed and reference images, and then learn the mapping directly between these patch-pairs and

their matching labels for later registration. This end-to-end architecture allows us to optimize the whole

processing (learning mapping function) through information feedback when training the network, which

is lacking in conventional methods. In addition, to alleviate the small data issue of remote sensing images

for training, our proposal introduces a self-learning by learning the mapping function using images and

their transformed copies. Moreover, we apply a transfer learning to reduce the huge computation cost in

the training stage. It does not only speed up our framework, but also get extra performance gains. The

comprehensive experiments conducted on seven sets of remote sensing images, acquired by Radarsat,

SPOT and Landsat, show that our proposal improves the registration accuracy up to 2.4–53.7%.

Ó 2017 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier

1. Introduction

Image registration is the process of geometrically aligning refer-

ence image and sensed image, which are about the same scene and

acquired at different times, even by different sensors or from dif-

ferent viewpoints (Zitova and Flusser, 2003; Jacqueline, 2017).

Image registration is a signiﬁcant problem in remote sensing image

processing, which will directly inﬂuence the performance of the

follow-up works, such as image fusion, change detection, and envi-

ronmental monitoring.

In past decade, the remote sensing image registration problem

has been addressed by two types of methods: area-based methods

and feature-based methods(Zitova and Flusser, 2003). Area-based

methods search the optimal geometric transform parameters by

optimizing the images similarity, in which Mutual Information

(MI), Kullback-Leibler divergence, and normalized cross-

correlation (NCC) are widely accepted (Kern and Pattichis, 2007;

Suri and Reinartz, 2009; Parmehr et al., 2012; Liang et al., 2013;

Xu et al., 2016b). Although area-based methods can be easily

implemented, they are sensitive to intensity change, illumination

change and noise. On the contrary, the feature-based methods

overcome above defects and establish the geometric relation more

effectively via matching the salient features, such as points, lines,

and regions. In practice, SIFT (Lowe, 2004), SURF (Bay et al.,

2008), HOG (Dalal and Triggs, 2005), MSER (Matas et al., 2004),

Afﬁne-SIFT (Morel and Yu, 2009) are commonly applied. Moreover,

other researches combine area-based methods and feature-based

methods for a coarse-to-ﬁne image registration (Yong et al.,

2009; Ma et al., 2010; Goncalves et al., 2011; Gong et al., 2014).

The representative in feature-based methods is scale-invariant

feature transform (SIFT), because its feature descriptor is invariant

under translation, rotation, and scale change on normal images.

However, remote sensing images are generated by a complicated

imaging mechanism, whose appearance is determined by the radi-

ation characteristic, geometric characteristic of objects, and the

transmitting or receiving conﬁguration of sensors. In registration

tasks, the reference image and sensed image may even come from

different sensors, have varied resolutions, spectral and so on

(Jacqueline, 2017). Due to the speciality of remote sensing images,

the invariants of SIFT designed for normal images may not be

maintained on remote sensing images. Our experiments reveal that

the computed principal direction of SIFT keypoints becomes unre-

liable, because the statistic of gradients around the keypoint

severely varies. An example, Fig. 1(a) and (b) illustrates this prob-

lem. We select some strong SIFT points that should be matched

https://doi.org/10.1016/j.isprsjprs.2017.12.012

Special issue on Deep Learning for Remotely Sensed Data.

⇑

Corresponding author.

E-mail addresses: shwang.xd@gmail.com (S. Wang), xliang@i.kyoto-u.ac.jp

(X. Liang).

ISPRS Journal of Photogrammetry and Remote Sensing xxx (2018) xxx–xxx

Contents lists available at ScienceDirect

ISPRS Journal of Photogrammetry an d Remote Sensing

journal homepage: www.elsevier.com/locate/isprsjprs

Please cite this article in press as: Wang, S., et al. A deep learning framework for remote sensing image registration. ISPRS J. Photogram. Remote Sensing

(2018), https://doi.org/10.1016/j. isprsjprs.2017.12.012

between two images, and pair them by giving the same name. The

yellow arrows indicate their principle directions in each of them.

Obviously, the principal directions of corresponding points are

not consistent or even going in the opposite direction. This unreli-

able direction results in a false matching as shown in Fig. 1(c). In

this example, 91% SIFT points break down the rotation invariance,

thus lead to a sever mis-registration due to insufﬁciency of correct

matched points.

Moreover, the procedure of feature-based method is arguable

for remote sensing image registration, which can be summarized

as below. Firstly, detect keypoints from the reference image and

sensed image, secondly, extract the features of these keypoints

according to their neighborhood pixels, thirdly, match features

by the feature distance, ﬁnally, the registration is done by estimat-

ing geometric transform matrix according to the matched key-

points. To ensure the performance, a well-designed feature and

feature extractor are required, but need massive engineering

works. These handcrafted features (e.g. edge, texture, corner and

the statistical information of gradient) lack of high-level semantic

information due to no information feedback between features

extraction and matching. Therefore, these methods have limited

applicability and solely perform well on speciﬁc images with suit-

able feature representation.

Recent years, deep learning has attracted increasing attention

and achieved great successes. The major reason is that it is a fully

data-driven scheme, can automatically learn the features from

images. Speciﬁcally, deep learning has multi-level of non-linear

operations, which seek to exploit the distribution structure of

input data or abstract representation (Schmidhuber, 2014; Lecun

et al., 2015). Its end-to-end architecture is able to optimize

the entire network by the information feedback. Inspired by

advantages of deep neural network (DNN), we propose a deep

learning framework for remote sensing image registration. By tak-

ing the image patch-pairs as input and matching labels as output,

our proposal directly learns an end-to-end mapping function. Its

hidden layers correspond to features extractor, and the output

layer corresponds to features matching. This architecture uniﬁes

the features extraction and matching in a closed-loop learning

framework by involving not only information feedforward but also

feedback. This signiﬁcant difference, unlike the conventional

feature-based method, permits the results of features matching

to guide the process of features extraction, and then make the

learned features more appropriate for the data. However, there

are two encountered problems when we apply DNN: small training

data set; and huge computation cost in training stage.

Deep learning is double-edged, can approximate very complex

function but must optimize thousands (even millions) parameters

for a problem. To prevent over-ﬁtting, the networks are commonly

trained on a large amount of samples. Unfortunately, it is tricky to

have enough remote sensing images for this large scale data learn-

ing. Moreover, the manual annotation needs the professional

knowledge, and the process is extreme expensive. In some excep-

tional cases, new sensed images, coming from a different sensor,

are unlikely to be well registered by the network trained on data

from other sensors. To address this problem, we proposed a self-

learning with an idea of learning the mapping function from

images and their varied transformed copies. This idea can be

deemed to be a new way of data augmentation, which creates

the labeled training data from scratch other than the conventional

strategies by extending the dataset from the existing labeled data.

This method comes with four advantages. First, the number of

training samples are greatly increased. Second, the matching labels

Fig. 1. (a) and (b) Show the corresponding keypoints between two remote sensing images, and their principle directions computed by SIFT algorithm. (c) and (d) Are

matching results by SIFT and our method, respectively. The yellow lines are correct matching, whereas the green lines represent false matching.

2 S. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing xxx (2018) xxx–xxx

Please cite this article in press as: Wang, S., et al. A deep learning framework for remote sensing image registration. ISPRS J. Photogram. Remote Sensing

(2018), https://doi.org/10.1016/j. isprsjprs.2017.12.012

of image patch-pairs for training are known. Third, the features

learned by DNN have rotation invariance, scale invariance, and

translation invariance by learning from different transformed

images. Fourth, for the registration of new images, the mapping

function is learned by itself and its transformed images, without

requiring an assistance by other images.

For each image registration, it is time consuming that training

the network start from scratch. To reduce the computation cost,

we apply transfer learning by taking the trained network of other

images as initial network and then ﬁne-tune it by our target

images. Experiments show that not only the training time is

greatly reduced, but our framework gains a better registration

performance.

In summary, the main contribution of this work has threefold.

(1) We propose a deep learning framework for remote sensing

image registration, which directly learn the end-to-end

mapping between the image patch pairs and their matching

labels.

(2) We propose a self-learning to slove the small data and data

labeling problem in remote sensing image registration. The

mapping function is learned from itself and its transformed

images.

(3) We apply the transfer learning to reduce the training cost by

taking the trained network of other images as initial net-

work, and then ﬁne-tune it using target images.

The rest of this paper is organized as follows. Section 2 intro-

duces the related work of image registration and deep learning.

Section 3 details our deep learning framework for remote sensing

image registration. Section 4 show the extensive experiments

and analysis on our proposal. We conclude in Section 5.

2. Related work

2.1. Image registration

In this section, we brieﬂy introduce the existing image registra-

tion methods about aforementioned two categories: area-based

methods and feature-based methods.

The ﬁrst widely studied methods for image registration were

area-based (Maes et al., 1997; Kern and Pattichis, 2007; Suri and

Reinartz, 2009; Liang et al., 2013). It transforms the registration

problem into an optimization problem that maximizes the similar-

ity between reference image and sensed image. Liang et al. (2013)

proposed a spatial and mutual information (SMI) as the similarity

metric for searching similar local regions by using ant colony opti-

mization. To speed up computing MI, Patel and Thakar (2015) esti-

mated MI based on maximum likelihood. However, the area-based

methods heavily rely on pixel intensities, then are sensitive to illu-

mination change and noises. Another idea was to ﬁnd the optimal

parameters in other domains. Reddy and Chatterji (1996) proposed

a fast Fourier transform-based (FFT-based) method to ﬁnd the opti-

mal matching in frequency domain, but it did not meet the

required accuracy. With the development of feature point extrac-

tion, a new trend was trying to build geometric transform by

matching feature points. As the particular imaging mechanism, a

large variety of the SIFT algorithm had been proposed to remote

sensing image registration (Schwind et al., 2010; Sedaghat et al.,

2011; Wang et al., 2012; Fan et al., 2013; Ye and Shan, 2014; Fan

et al., 2015; Ma et al., 2017). Schwind et al. (2010) proposed

SIFT-OCT that skipped the ﬁrst octave of the scale space to reduce

the inﬂuence of noise. Wang et al. (2012) applied the bilateral ﬁlter

(BF) to construct anisotropic scale space. It could preserve more

details by combining two Gaussian ﬁlter on the spatial space and

intensity. Meanwhile, progresses had also been done on feature

matching. Wu et al. (2015) proposed a fast sample consensus

(FSC) algorithm that applied high correct rate matching points to

calculate the transform parameters, and then selected the match-

ing points that have only subpixel error. Kupfer et al. (2015) pro-

posed a mode-seeking SIFT (MS-SIFT) method that deﬁned new

mode scale, mode rotation difference and mode translations for

feature points. It could reﬁne the result by eliminating outliers

whose horizontal or vertical shift is far than mode translation.

Ma et al. (2017) proposed modiﬁed SIFT feature and robust feature

matching method to overcome the intensity difference between

image pairs, where the feature matching combine the feature dis-

tance, FSC and MS-SIFT.

Additionally, some papers are focus on the geometric struc-

ture or shape features of images. Ye et al. propose to represent

the structural properties of images by a new feature descriptor

named the histogram of orientated phase congruency (HOPC)

(Ye et al., 2017a; Ye and Shen, 2016), and then take NCC as

similarity metric for template matching. Ye et al. (2017b) pro-

pose a novel shape descriptor for image matching based on

dense local self-similarity (DLSS) and normalized cross-

correlation (NCC). Yang et al. (2017) propose to combine the

shape context feature and SIFT feature for remote sensing

image registration.

In the meantime, other researches put effort on integrating

the advantages of area-based methods and feature-based meth-

ods (Yong et al., 2009; Ma et al., 2010; Goncalves et al., 2011;

Gong et al., 2014). Ma et al. (2010) applied normalized cross-

correlation (NCC) to acquire control points with better spatial

distribution after SIFT preliminary registration. Xu et al.

(2016a) proposed an iterative multi-level strategy to adjust

parameters, which re-extracts and re-matches features, to

improve the registration. Gong et al. (2014) proposed a coarse-

to-ﬁne method for image registration that acquired the coarse

results by SIFT and then achieved the precise registration based

on mutual information. Goncalves et al. (2011) combined image

segmentation and SIFT, which extracted objects from images

using Otsus thresholding method, and then apply SIFT to obtain

matching points from objects.

We can see that the conventional feature-based methods

require a careful engineering and domain knowledge to design fea-

ture extractor. This makes the handcrafted features somehow

speciﬁc, but less generalized.

2.2. Deep learning

Deep learning has achieved great successes in the areas of com-

puter vision (Farabet et al., 2013; Simoserra et al., 2015), speech

processing (Hinton et al., 2012; Graves et al., 2013) and image pro-

cessing (Krizhevsky et al., 2012; Simonyan and Zisserman, 2014;

Russakovsky et al., 2015; Ren et al., 2016). In deep learning, the

popular models are deep belief network (DBN), auto-encoder

(AE) and convolution neural networks (CNN). They share a similar

structure stacked with multiple layers (Bengio, 2009), each layer

abstracts the former features to higher-level features by a non-

linear mode. Some deep models can be used to exploit the distribu-

tion characteristics of data by minimizing the reconstruction error

(Hinton et al., 2006; Vincent et al., 2010; Lecun et al., 2015), and

other models are used to acquire semantic features by stochastic

gradient descent in back propagation (BP) algorithm (Rumelhart

et al., 1986).

Deep learning has been introduced to the area of remote sens-

ing image and shown superiority and robustness (Han et al.,

2015a; Cheng et al., 2016a,b; Scott et al., 2017; Cheng et al.,

2015; Romero et al., 2016; Yao et al., 2016; Zhou et al., 2016;

Zhao and Du, 2016b,a; Gong et al., 2016; Zhang et al., 2016;

S. Wang et al. / ISPRS Journal of Photogrammetry and Remote Sensing xxx (2018) xxx–xxx

Please cite this article in press as: Wang, S., et al. A deep learning framework for remote sensing image registration. ISPRS J. Photogram. Remote Sensing

(2018), https://doi.org/10.1016/j. isprsjprs.2017.12.012