DualPathAttentionNetforRemoteSensingSemanticImageSegmentation.pdf资源-CSDN文库

需积分: 10 66 浏览量 2021-06-15 17:45:55 上传评论收藏 3.97MB PDF 举报

深度学习、遥感影像、语义分割是当前计算机视觉领域的重要研究方向，它们在识别遥感影像中的不同地物类型方面发挥着关键作用。语义分割作为计算机视觉研究的一个基本方面，目的在于为图像中的每个像素分配一个类别标签。近年来，基于全卷积网络（FCNs）的深度学习方法已经被证明对于遥感影像的语义分割非常有效。但这些影像通常包含丰富信息和复杂内容，使得分割网络的训练面临挑战。同时，由于实际应用场景的数据集往往受限，数据集规模小且分布不均匀，这为网络的训练和分割性能的提升带来了难题。为了应对这些挑战，研究人员提出了一种名为双路径注意力网络（Dual Path Attention Network, DPA-Net）的卷积神经网络（CNN）模型。该模型具有简单的模块化结构，可以被添加到任何分割模型中，以增强其学习特征的能力。DPA-Net在分割模型上附加了两种类型的注意力模块，一种关注空间信息，另一种关注通道信息。这两个注意力模块的输出进一步融合，从而增强网络提取特征的能力，有助于实现更精确的分割结果。研究者在Gaofen遥感图像数据集（GID）上对该网络进行了测试。测试结果显示，提出的网络在平均交并比（mean IoU）方面分别超越了U-Net、PSP-Net和DeepLabV3+模型0.84%、2.54%和1.32%。这一结果表明，DPA-Net在提升遥感影像语义分割性能方面表现出色。文章中还提到了数据预处理和增强策略的使用。这些策略被用来补偿数据集数量不足和分布不均的缺陷。通过对输入数据进行适当的预处理和增强，可以增强网络对不同情况的泛化能力，并进一步提升分割模型的性能。关键词方面，该研究涉及了“遥感影像”、“语义分割”、“全卷积网络”、“卷积神经网络”以及“自注意力机制”。自注意力机制在模型中起到了关键作用，它让网络能够更好地关注图像中的重要区域，从而提高了特征提取的效率和准确性。为了深入理解本文的研究内容，我们需要对几个核心概念和研究方向有所掌握。全卷积网络（FCNs）是专门用于图像分割的卷积神经网络，它由全连接层的替代品——卷积层构成，能够处理任意大小的输入图像。FCNs的最大特点是输出和输入具有相同的尺寸，这样可以为每个像素分配一个类别标签，非常适合语义分割任务。语义分割是一个像素级的图像识别任务，它要求算法不仅能够识别图像中的物体，还要理解每个像素所代表的具体含义。在遥感图像分析中，这一技术可以帮助我们对地物进行准确分类，对于土地覆盖分类、灾害监测、城市规划等领域有着广泛的应用价值。卷积神经网络（CNN）是一种深度学习模型，它特别适合于图像识别、分类和分割等视觉任务。CNN通过其独特的卷积层结构，可以自动并有效地从图像中提取空间层级特征。网络通常由多个卷积层、池化层和全连接层构成，能够逐步从简单的边缘和纹理特征过渡到复杂的模式和形状特征。双路径注意力网络（DPA-Net）提出的自注意力机制是近年来深度学习领域的一个热点研究方向。自注意力机制能够使网络在处理序列数据（如自然语言处理中的词序列）或图像数据时，更加关注与当前任务相关的部分，而忽略不相关的信息。这种机制在图像处理任务中的应用，能够帮助网络更好地理解图像内容的全局上下文信息，提升图像识别的准确性。在实际应用中，遥感图像数据集的规模往往有限，且数据的分布可能不均匀，这对于训练鲁棒性好的模型构成了困难。因此，数据增强成为了提升模型泛化能力的重要手段之一。数据增强通常包括对数据进行旋转、缩放、裁剪、翻转、颜色变换等操作，以增加数据的多样性，从而帮助模型更好地泛化到新的、未见过的数据上。本文的研究不仅在理论层面深化了对深度学习在遥感图像语义分割应用的理解，而且通过实证研究证实了双路径注意力网络在提升分割精度上的有效性。这些成果不仅对学术界具有参考价值，也为相关行业的技术进步提供了支持。

资源推荐

资源详情

资源评论

International Journal of

Geo-Information

Article

Dual Path Attention Net for Remote Sensing

Semantic Image Segmentation

Jinglun Li *, Jiapeng Xiu, Zhengqiu Yang and Chen Liu

School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100096, China;

xiujiapeng@bupt.edu.cn (J.X.); zqyang@bupt.edu.cn (Z.Y.); lchen@bupt.edu.cn (C.L.)

* Correspondence: jingli960423@bupt.edu.cn; Tel.: +86-188-1163-0369

Received: 13 August 2020; Accepted: 27 September 2020; Published: 29 September 2020



 

Abstract:

Semantic segmentation plays an important role in being able to understand the content

of remote sensing images. In recent years, deep learning methods based on Fully Convolutional

Networks (FCNs) have proved to be eﬀective for the sematic segmentation of remote sensing images.

However, the rich information and complex content makes the training of networks for segmentation

challenging, and the datasets are necessarily constrained. In this paper, we propose a Convolutional

Neural Network (CNN) model called Dual Path Attention Network (DPA-Net) that has a simple

modular structure and can be added to any segmentation model to enhance its ability to learn features.

Two types of attention module are appended to the segmentation model, one focusing on spatial

information the other focusing upon the channel. Then, the outputs of these two attention modules

are fused to further improve the network’s ability to extract features, thus contributing to more

precise segmentation results. Finally, data pre-processing and augmentation strategies are used to

compensate for the small number of datasets and uneven distribution. The proposed network was

tested on the Gaofen Image Dataset (GID). The results show that the network outperformed U-Net,

PSP-Net, and DeepLab V3+ in terms of the mean IoU by 0.84%, 2.54%, and 1.32%, respectively.

Keywords:

remote sensing image; semantic segmentation; fully convolutional network; convolutional

neural network; self-attention mechanism

1. Introduction

Semantic segmentation is a fundamental aspect of computer vision research. Its goal is to assign a

category label to each pixel in an image. Together with other kinds of deep learning research, it plays

an important role in the recognition of diﬀerent types of land cover in remote sensing images [

–

Recognizing the information an image contains is a key part of remote sensing image interpretation.

Semantic segmentation is widely used in land cover mapping and monitoring, urban classiﬁcation

analysis, tree species identiﬁcation in forest management, etc. [

–

]. To accomplish it, land cover

types need to be distinguished in terms of “same object, diﬀerent spectrum”, or “same spectrum,

diﬀerent object”. For instance, “lake” and “river” are two diﬀerent types of land cover, but in remote

sensing, they can have a similar appearance. Places with a high density of buildings or a low density

of buildings may still both be classiﬁed as urban residential areas. In addition, the boundaries between

diﬀerent types of land cover are intricate and irregular, which makes the remote sensing segmentation

task even more diﬃcult. Thus, discrimination between features at a pixel level is essential.

In recent years, the state-of-the-art in semantic segmentation networks has progressed

enormously [

–

]. One way to solve the above issues is by using a recurrent neural network

to capture long-range contextual information. This kind of network can achieve remarkable results.

For instance, a directed acyclic graph recurrent neural network [

] can capture the rich contextual

information present in local features. However, although this method is very eﬀective, it is largely

ISPRS Int. J. Geo-Inf. 2020, 9, 571; doi:10.3390/ijgi9100571 www.mdpi.com/journal/ijgi

ISPRS Int. J. Geo-Inf. 2020, 9, 571 2 of 20

dependent on longer-term learning results. Obtaining such a large number of remote sensing image

segmentation labels is very diﬃcult, so it is of limited practical utility for the segmentation of remote

sensing images.

Another eﬀective way of tackling the issues described above is to use self-attention mechanisms.

These are popular and simple to adapt to semantic segmentation tasks because of their varied and

ﬂexible structure [

–

]. Self-attention mechanisms focus on local features by generating weight

feature maps and fusing downstream feature maps. This may involve having one or more modules

built upon a basic backbone, with each module focusing on things such as the channel or spatial

information. However, downstream feature maps can lose a lot of spatial information, and the capture

of the original spatial information directly is currently not feasible. Yet, having very precise spatial

information is crucial for the eﬀective segmentation of remote sensing images.

To address the above issues, we propose here a novel self-attention mechanism model, called a

Dual Path Attention Network (DPA-Net), which is designed for remote sensing semantic segmentation.

It uses two attention modules: a total spatial attention module to capture spatial information and a

channel attention module to capture the channel information separately. The two modules can easily be

appended to other segmentation models such as PSP-Net [

]. At present, there are many methods for

the eﬃcient extraction of diﬀerent kinds of feature information. However, the input of almost all spatial

attention methods is the feature map after sampling. As mentioned above, compared with the original

image, the downsampled feature map contains a lot less spatial information. Therefore, this kind of

spatial attention is inevitably ineﬃcient, as it is unable to fully utilize the spatial information in the

data. Therefore, instead of the downsampled image, we changed the input of the spatial attention

method to the original image. In the total spatial attention module, spatial information is captured

from the original image according to the self-attention mechanism mentioned above. The output

of the TSAM is a single channel weight matrix. Each pixel of the output can be updated again by

fusing according to the corresponding weight, with the weight itself being generated by the module.

After being fused with the ﬁnal feature map of DPA-Net, the TSAM will provide a weight for each pixel.

During the training, the network pays higher attention to the areas with larger weights. This means

that each pixel has its own focus in the network. For the channel attention module, the self-attention

mechanism captures the channel information according to the channel maps. As with the total spatial

attention module, it generates a weight factor. The feature maps are updated by integrating this weight

factor. Once the two modules have completed their operations, two feature maps are obtained that

contain spatial information and channel information, respectively. Then, these two feature maps are

aggregated to generate the ﬁnal output.

It is worth emphasizing that although the proposed method is more eﬀective than the original

self-attention method, it does not signiﬁcantly change the memory footprint. Overall, it solves the

conventional problems associated with self-attention mechanisms in a straightforward way. First of all,

the TSAM makes its calculations on the basis of the original image. When compared to downstream

feature maps, original remote sensing images contain more spatial information. Secondly, the output of

the two modules acts on the last feature map in the model. Thus, the two modules can control the back

propagation of the entire model. In addition, the simplicity of the module structure makes it easy for it

to be used with any segmentation model. To verify the eﬀectiveness of our method, we conducted

experiments with U-Net, PSP-Net, and DeepLab V3+ [

] on the Gaofen Image Dataset (GID) [

It improved the mean IoU for each module by 0.84%, 2.54% and 1.32%, respectively.

The main contributions of the paper can be summarized as follows:

•

We propose a Dual Path Attention Network (DPA-Net) that uses a self-attention mechanism to

enhance a network’s ability to capture key local features in the semantic segmentation of remote

sensing images.

•

A total spatial attention module is used to extract pixel-level spatial information, and a channel

attention module is proposed to focus on diﬀerent features. After the dual path feature extraction

has taken place, the performance of the sematic segmentation is signiﬁcantly improved.

ISPRS Int. J. Geo-Inf. 2020, 9, 571 3 of 20

•

As the number of images in the test dataset, GID, was rather small, processing strategies were

developed to improve the quality of our tests. By extension, these strategies can be used more

generally to improve the segmentation of small datasets.

2. Related Work

Remote Sensing.

High-resolution remote sensing images form the basic data for spatial

information technology in geographic information systems. They are also an important national

and international strategic information resource [

–

]. The images collected by remote sensors

installed on aircraft or satellites underpin remote recognition techniques that aim to recognize land

cover, such as buildings, farmland, vegetation, bare soil, rivers, etc. After the land cover has been

recognized, thematic maps are often produced to visually represent its distribution. When combined

with computer vision algorithms, remote recognition techniques have signiﬁcant advantages regarding

real-time capture and cost when compared to traditional ﬁeld surveys. Therefore, they are increasingly

used in the ﬁelds of land-use planning, forestry, and soil-loss monitoring [31–34].

Semantic Segmentation.

Semantic segmentation aims to segment and parse a scene image into

diﬀerent regions associated with semantic categories. In recent years, various methods based on

FCNs [

] have led to important breakthroughs in semantic segmentation. One way to improve the

performance of a segmentation model is to enhance its contextual aggregation. Several models such as

U-Net use an encoder–decoder structure [

] to integrate midstream features and downstream

features. The encoder module gradually reduces the size of the feature maps and captures higher-level

semantic information. The spatial information is recovered by the decoder module. Models such

as DeepLab V3+ apply atrous spatial pyramid pooling to fuse features at several diﬀerent scales

and across various diﬀerent sub-regions [

–

]. Outside of this, parallel dilated convolutions

with diﬀerent dilation rates can enlarge the receptive ﬁeld. Another eﬀective approach is to capture

rich context dependencies. For instance, Peng [

] developed the concept of large kernel matters for

learning contextual dependencies using a global convolutional network (GCN). Mnih et al. [

] added

an attention mechanism to a recurrent neural network (RNN) to reduce its complexity. Wang et al. [

]

were the ﬁrst to propose a recurrent attention structure for remote sensing images. Here, a mask matrix

is used for the attention weights, which then multiply the feature map to obtain an attention-based

representation of high-level features.

Self-Attention Mechanisms. Self-attention mechanisms provide an eﬀective way of enhancing the

ability of a neural network to capture critical local features. The approach [

] was ﬁrst proposed for

machine translation, but it is now widely used in image classiﬁcation [

], image segmentation [

and other ﬁelds [

–

]. Many studies have shown that attention mechanisms can enhance the

identiﬁcation of neurons with key characteristics and improve a network’s performance. For example,

Convolution Block Attention Modules (CBAM) [

] draw on top-level information to get weights

channel-wise or spatial activations by concatenating channel and spatial attention modules. In a

diﬀerent approach, DA-Net [

] runs a channel attention module and spatial attention module in

parallel in a non-local autocorrelation matrix, which has delivered good results.

3. Methods

In this section, we ﬁrst present the overall framework of our network; then, we introduce the two

attention modules, which capture spatial and channel-related contextual information. The section

concludes with a description of how the output from the two modules is aggregated to give the

ﬁnal output.

3.1. Overview

For regular semantic segmentation, the scene for segmentation will include a variety of objects of

diverse scales with diﬀerent lighting that are visible from diﬀerent viewpoints. However, because of

the same shooting angle and distance of the samples in diﬀerent remote sensing images, the boundary

剩余19页未读，继续阅读

评论收藏

内容反馈

吾王saber_

粉丝: 1
资源: 7

Dual Path Attention Net for Remote Sensing Semantic Image Segmen...

最新资源

Dual Path Attention Net for Remote Sensing Semantic Image Segmen...

fundamentals of remote sensing加拿大遥感中心.pdf

Object based image analysis for remote sensing.pdf

RemoteSensing Digital Image Analysis

remote sensing digital image analysis

remote sensing image dataset

Remote_Sensing_Image_Fusion_With_Deep.pdf

MATLAB Toolbox for Remote Sensing Change Detection.zip

Mathematical Models for Remote Sensing Image Processing

A hypergraph based context-sensitive representation technique for VHR remote sensing image change detection.

MDPI(remote sensing).ens

A novel active contour model based on modified symmetric cross entropy for remote sensing river image segmentation

Remote.Sensing.and.GIS.Integration

remotesensing-09-00796-v2.pdf

电子：产业基金收购成熟WLO方案商，国产3D Sensing布局进一步完善.pdf

[ubicomp22]FlowSense_ Monitoring Airflow in Building Ventilation Systems__Using Audio Sensing (1).pdf

remotesensing-11-01184.pdf_remotesensing_

Spatial Statistics for Remote Sensing 地统计遥感

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

Transformer模型实现长期预测并可视化结果（附代码+数据集+原理介绍）

YOLOv8网络结构图，自制visio文件，yolov8.vsds，需要的自取，在原有的基础上直接改就行了

yolov8(2023年8月版本),已经下好yolov8s.pt和yolov8n.pt

社交平台上经济类话题的文章热度信息，数据是真实的，但不是真实日期

行人跌倒数据集（VOC格式）

CIFAR10数据集免费下载

Deep Learning Tuning Playbook（中译版）

YOLOV5口罩检测数据集+代码+模型 2000张标注好的数据+教学视频.zip

zotero翻译插件.xpi

最新资源