【免费】206-0919-審稿paper.pdf资源-CSDN文库

需积分: 0 9 浏览量 2022-12-29 15:10:36 上传评论收藏 984KB PDF 举报

资源推荐

资源详情

资源评论

International Journal of Robotics and Automation

An Adversarial and Deep Hashing-Based Hierarchical Supervised Cross-modal Image

and Text Retrieval Algorithm

--Manuscript Draft--

Manuscript Number: 206-0919

Full Title: An Adversarial and Deep Hashing-Based Hierarchical Supervised Cross-modal Image

and Text Retrieval Algorithm

Article Type: Full Article

Keywords: Cross-modal image and text retrieval; deep hash algorithm; hierarchical supervision;

adversarial network

Abstract: With the rapid development of robotics and sensor technology, vast amounts of

valuable multimodal data are collected. It is extremely critical for a variety of robots

performing automated tasks to find relevant multimodal information quickly and

efficiently in large amounts of data. In this paper, we propose an adversarial and deep

hashing-based hierarchical supervised cross-modal image and text retrieval algorithm

to perform semantic analysis and association modeling on image and text by making

full use of the rich semantic information of the label hierarchy. First, the modal

adversarial block and the modal differentiation network both perform adversarial

learning to keep different modalities with the same semantics closest to each other in a

common subspace. Second, the intra-label layer similarity loss and inter-label layer

correlation loss are used to fully exploit the intrinsic similarity existing in each label

layer and the correlation existing between label layers. Finally, an objective function for

different semantic data is redesigned to keep data with different semantics away from

each other in a common subspace, thus avoiding interference of retrieval by data of

different semantics. The experimental results on two cross-modal retrieval datasets

with hierarchically supervised information show that the proposed method substantially

enhances retrieval performance and consistently outperforms other state-of-the-art

methods.

Manuscript received DD Month YYYY (write the date on which you submitted your paper for review.)

AN ADVERSARIAL AND DEEP HASHING-BASED

HIERARCHICAL SUPERVISED CROSS-MODAL

IMAGE AND TEXT RETRIEVAL ALGORITHM

Abstract

With the rapid development of robotics and sensor technology, vast amounts of valuable multimodal data

are collected. It is extremely critical for a variety of robots performing automated tasks to find relevant

multimodal information quickly and efficiently in large amounts of data. In this paper, we propose an

adversarial and deep hashing-based hierarchical supervised cross-modal image and text retrieval

algorithm to perform semantic analysis and association modeling on image and text by making full use of

the rich semantic information of the label hierarchy. First, the modal adversarial block and the modal

differentiation network both perform adversarial learning to keep different modalities with the same

semantics closest to each other in a common subspace. Second, the intra-label layer similarity loss and

inter-label layer correlation loss are used to fully exploit the intrinsic similarity existing in each label layer

and the correlation existing between label layers. Finally, an objective function for different semantic data

is redesigned to keep data with different semantics away from each other in a common subspace, thus

avoiding interference of retrieval by data of different semantics. The experimental results on two

cross-modal retrieval datasets with hierarchically supervised information show that the proposed method

substantially enhances retrieval performance and consistently outperforms other state-of-the-art methods.

Key Words

Cross-modal image and text retrieval; deep hash algorithm; hierarchical supervision; adversarial network

1. Introduction

In recent years, various types of intelligent robots [1] have developed rapidly. Cross-modal retrieval [2, 3],

a key technology for robots to achieve automated tasks through the understanding of multimodal content,

Manuscript Click here to access/download;Manuscript;Manuscript.doc

is the process of retrieving data from one modality and returning data from other modalities that are most

semantically relevant to the retrieved data.

In recent years, many approaches are proposed to address cross-modal retrieval. The traditional

cross-modal retrieval method [4-9] constructs a matrix for different media, projecting it uniformly into a

shared subspace, and then utilizes distance metrics such as Euclidean distance or cosine distance to

measure the similarity between heterogeneous modalities. Canonical Correlation Analysis (CCA) [4] is

widely used in cross-modal retrieval, and many cross-modal retrieval methods have been built on it.

However, most traditional cross-modal retrieval methods rely on hand-designed features and it is still

difficult to solve the “heterogeneity gap” problem effectively.

Deep neural networks have made progress in many fields such as computer vision [10, 11] and

natural language processing [12, 13], which are also effectively adopted in cross-modal retrieval.

However, there are problems of high storage costs and slow retrieval speed when employing deep learning

methods [14-16] for cross-modal retrieval of large-scale data.

In the storage and retrieval of large-scale cross-modal data, hashing algorithms [17-21] are widely

regarded for their low storage cost and high retrieval efficiency. Jiang et al. [22] proposed the deep

cross-modal hashing (DCMH) to integrate feature learning and hash code learning into a unified

framework. Li et al. [23] proposed the self-supervised adversarial hashing (SSAH) method to build

self-supervised semantic networks by using labels as self-supervised information.

Most of the existing cross-modal retrieval methods are used for non-hierarchically structured

supervised data, and cannot fully exploit the supervised information of the labels. However, in many

real-world application scenarios, label-supervised information on cross-modal data often has some kind of

hierarchical structure with rich semantic information. For example, in the field of public security, the

image or video automatically collected by robots through sensors may contain multiple layers of label

supervision information.

There are only a few methods currently that have been designed to label supervision information in

hierarchical structures. Wang et al. [24] proposed the supervised hierarchical deep hashing (SHDH)

method, which defines a similarity formula to weight different levels for labeled supervised information of

the hierarchy and verifies that the hierarchical information can improve the hash retrieval accuracy.

However, this method is designed for single-modal retrieval. To verify the effectiveness of labels with

hierarchical structure in cross-modal retrieval, Sun et al. [25] proposed the supervised hierarchical

cross-modal hashing (HiCHNet) method to learn hierarchical information and regularized cross-modal

hashing simultaneously. However, those methods have the following problems:

• The distance between multimodal data with the same semantic information in the common subspace is

not sufficiently minimized.

• The inter-layer correlation of supervisory information is not sufficiently considered so that complex

inter-layer correlation information is not fully learned.

• Cross-modal retrieval has interfered with dissimilar data.

To address the above problem, we propose a novel method for hierarchical supervised cross-modal

image and text retrieval. The contributions of this study are as follows:

• The feature extraction network and the modality differentiation network, which are used as generators

and adversaries respectively, both perform adversarial learning to result in the closest distance in the

common space for different modalities containing the same semantics.

• The intra-label layer similarity loss and inter-label layer correlation loss are introduced to fully explore

the intrinsic similarity existing in each layer of labels and the correlation existing between label layers,

thus improving the accuracy of cross-modal retrieval.

• An objective function for the distance between different semantic categories of data is redesigned to

keep the modal data of different semantic categories distant from each other in the common space.

剩余16页未读，继续阅读

评论收藏

内容反馈

aftermath,,,

粉丝: 0
资源: 1

206-0919-審稿paper.pdf

五年级-下册-折线统计图-测试题终审稿).pdf

信息系统项目管理师九大知识点(终审稿).pdf

微信小程序认证公函终审稿).pdf

信息系统项目管理师考试大纲(终审稿).docx

信息系统项目管理师九大知识点(终审稿).docx

基于jsp的在线投稿审稿系统的设计与实现-毕业设计.pdf

基于云计算SaaS服务模式的投稿审稿系统.pdf

系统维护手册(终审稿).pdf

FormattingGuidelines-IJCAI-PRICAI-20.zip

《通信技术》征集审稿专家.pdf

员工弹性福利平台(终审稿).pdf

导热油炉设备安装要求(终审稿).pdf

四年级下册语文试卷分析终审稿).pdf

物业管理有限公司章程(终审稿).pdf

IT基础架构规划方案一(终审稿).pdf

公司民主管理规定汇编资料(终审稿).pdf

牛津英语二年级下册期中测试题终审稿).pdf

四年级数学下册运算律试题终审稿).pdf

四年级下册数学试题及答案终审稿).pdf

五年级下册期中考试语文试卷终审稿).pdf

1_sixyin-music-source-v1.0.7.js

植物大战僵尸杂交版v2.0安装程序.exe

植物大战僵尸杂交版v2.0.zip

植物大战僵尸杂交版v2.0.88安装程序.zip

B题 海岛补给路径优化.docx

misaka-v3.3.8.zip

uml图.docx

TiggerRamDiskV4.2Beta1-Win.zip

大麦抢票_BP全自动抢购教程+注意事项.rar

红果脚本.apk

最新资源

B题海岛补给路径优化.docx