视觉搜索的循环排序_传统视觉搜索资源-CSDN文库

119 浏览量 2021-03-03 02:34:47 上传评论收藏 2.18MB PDF 举报

在IT行业中，图像处理和搜索技术一直是重要的研究方向之一，尤其是在Web2.0技术发展迅猛的背景下，视觉搜索技术的重要性日益凸显。视觉搜索主要涉及到的图像和视频文档富含视听内容和用户提供的文本信息，而商业视觉搜索引擎目前大多还是通过关键词匹配来进行检索。为了提高搜索性能，通常会通过使用更大、更丰富的特征集对搜索引擎返回的视觉文档进行重新排序，这也是本文所研究的“视觉搜索的循环排序”算法的目的之一。 “视觉搜索的循环排序”算法，即本文提出的名为“循环重排序”的算法，旨在通过跨多个模态（多个特征或信息来源）的相互信息交换来提高搜索性能。循环重排序算法遵循这样一个理念：表现良好的模态可以从表现不佳的模态中学习，同时，表现不佳的模态也可以通过与表现良好的模态的相互作用而受益。技术上，循环重排序算法通过以无环的方式交换不同特征的排名分数进行多次随机游走（random walks）。与现有的技术不同，重新排序过程鼓励模态间的交互作用，以寻找对重新排序有用的一致意见。在研究中，本文探讨了循环重排序的几个特性，包括如何以及以何种顺序配置信息传播，才能充分利用模态的潜力进行重排序。本文在Microsoft Research Asia多媒体图像数据集和TREC视频检索评估2007-2008数据集上分别对图像和视频检索进行了研究，并报告了鼓舞人心的结果。研究的关键点在于多模态融合（multimodality fusion），即如何将不同的特征或信息来源融合以改进搜索结果。传统的多模态融合方法包括视觉模式挖掘（visual pattern mining）和多模态融合技术。视觉模式挖掘是指挖掘当前的模式，而多模态融合则是在多个不同的特征模态间进行有效的信息融合。循环重排序算法的关键在于其无环的信息交换机制，这使得算法能够通过迭代的随机游走来实现特征模态间的互相影响和学习。这种交换机制有利于各个模态间的“共识”形成，这是算法优化排序的关键。通过对不同特征模态的排名分数进行交换，算法可以动态地调整搜索结果的排序，从而更准确地找到用户期望的内容。此外，本文的研究还涉及了如何配置信息传播的顺序，以充分发掘模态的潜力。这是因为在视觉搜索中，不同的特征模态具有不同的信息价值和表征能力，算法需要能够识别和利用这些差异来提升搜索结果的质量。循环重排序的实现方式和策略选择对算法性能的影响是研究的重点之一。循环重排序算法是针对多模态环境下图像和视频检索提出的创新解决方案。它通过强化不同特征模态间的信息交互，不仅提高了检索精度，也拓展了多模态融合技术的应用领域。这一研究对于视觉搜索技术的发展具有重要意义，也为未来相关领域的发展提供了一个新的研究方向。

资源推荐

资源详情

资源评论

1644 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 4, APRIL 2013

Circular Reranking for Visual Search

Ting Yao, Chong-Wah Ngo, Member, IEEE,andTaoMei,Senior Member, IEEE

Abstract—Search reranking is regarded as a common way

to boost retrieval precision. The problem nevertheless is not

trivial especially when there are multiple features or modalities

to be considered for search, which often happens in image and

video retrieval. This paper proposes a new reranking algorithm,

named circular reranking, that reinforces the mutual exchange

of information across multiple modalities for improving search

performance, following the philosophy that strong performing

modality could learn from weaker ones, while weak modality

does beneﬁt from interacting with stronger ones. Technically,

circular reranking conducts multiple runs of random walks

through exchanging the ranking scores among different features

in a cyclic manner. Unlike the existing techniques, the reranking

procedure encourages interaction among modalities to seek a

consensus that are useful for reranking. In this paper, we study

several properties of circular reranking, including how and which

order of information propagation should be conﬁgured to fully

exploit the potential of modalities for reranking. Encouraging

results are reported for both image and video retrieval on

Microsoft Research Asia Multimedia image dataset and TREC

Video Retrieval Evaluation 2007-2008 datasets, respectively.

Index Terms—Circular reranking, multimodality fusion, visual

search.

I. INTRODUCTION

HE rapid development of Web 2.0 technologies has

led to the surge of research activities in visual search.

While visual documents are rich in audio-visual content and

user-supplied texts, commercial visual search engines to date

mostly perform retrieval by keyword matching. A common

practice to improve search performance is to rerank the visual

documents returned from a search engine using a larger and

richer set of features. The ultimate goal is to seek con-

sensus from various features for reordering the documents

and boosting the retrieval precision. There are two general

approaches along this direction: visual pattern mining [8] and

multi-modality fusion [1], [2]. The former mines the recurrent

patterns, either explicitly or implicitly, from initial search

results and then moves up the ranks of visually similar docu-

ments. Random walk [9], for instance, performs self-reranking

Manuscript received January 12, 2012; revised October 14, 2012; accepted

December 4, 2012. Date of publication December 24, 2012; date of cur-

rent version February 12, 2013. This work was supported in part by

the National Natural Science Foundation of China under Grant 61272290

and Grant 61228205 and a grant from Microsoft Research Asia Windows

Phone Academic Program FY12-RESOPP-107. The associate editor coordi-

nating the review of this manuscript and approving it for publication was

Prof. Ton Kalker.

T. Yao and C.-W. Ngo are with the Department of Computer

Science, City University of Hong Kong, Hong Kong (e-mail:

tingyao2@student.cityu.edu.hk; cscwngo@cityu.edu.hk).

T. Mei is with Microsoft Research Asia, Beijing 100190, China (e-mail:

tmei@microsoft.com).

Color versions of one or more of the ﬁgures in this paper are available

online at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TIP.2012.2236341

through identifying documents with similar patterns based on

inter-image similarity and initial rank scores. This category of

approaches, nevertheless, seldom explores the joint utilization

of multiple modalities. Instead, different modalities are treated

independently. Furthermore, the utilization of a modality is

often application dependent, making it difﬁcult to generalize

the mining for general-purpose search. Multi-modality fusion,

in contrast, predicts the importance of modalities, for instance,

through fusion weight learning, and linearly combines them

for reordering documents. The fusion, however, is done at

the decision stage. More speciﬁcally, the estimation of fusion

weights is mainly derived from the ranking scores in different

ranked lists. There is no mechanism, however, where the

interaction among multiple modalities could be exploited for

reranking in a principle way.

This paper proposes a novel algorithm, named circular

reranking, that takes advantages of both pattern mining and

multi-modality fusion for visual search. More importantly,

modality interaction is taken into account, on one hand to

implicitly mine recurrent patterns, and on the other, to leverage

the modalities of different strength for maximizing search per-

formance. Figure 1 shows an overview of our proposed work

compared with the existing methods. Given a ranked list of

visual documents returned from a search engine, conventional

methods use to perform random walk to rerank the results as

shown in Figure 1(a). There are variants of approaches arisen

from this methodology, for instance, conducting random walk

on the original text space [31] or a new space built upon

visual features [8], [10]. Typically each space is viewed as

a graph that speciﬁes the document proximity. More sophis-

ticated approaches include lately fusing the reranked results

from random walks in different feature spaces, or conversely,

performing random walk on a uniﬁed graph that is fused

from multiple features [9], [21]. Regardless of these different

versions, a common issue not fully explored and studied is

how the modalities should interact in view that their abilities in

answering different queries could vary largely. We address this

issue, as shown in Figure 1(b), from the viewpoint of mutual

reinforcement. Speciﬁcally, different modalities interact by

exchanging their feature spaces while preserving the original

document scores for random walk. The exchange results in

the following outcome: the ranks for documents which remain

sharing similar local view of proximity in a different space

tend to be upgraded. Take the text and bag-of-visual-words

(BoW) feature spaces in Figure 1(b) as an example, the

second and fourth images in the initial ranked list are similar

in both text and BoW feature spaces. After reinforcement

as in Figure 1(b), these group of images remains close in

proximity and thus their ranks are likely to be moved up

after random walk. Meanwhile, the second and third images

YAO et al.: CIRCULAR RERANKING FOR VISUAL SEARCH 1645

(a) (b) (c)

Fig. 1. Reranking the initial search result (top) returned from Bing for the query “Find the images with car.” The retrieved images are modeled as graphs

respectively in different feature spaces, with nodes (images) attributed by the ranking scores and edges representing feature proximity. (a) Randomwalks:

perform reranking by treating each feature space independently [9]. (b) Mutual reinforcement: exchanges modality spaces in a pairwise manner for random

walks [30]. (c) Circular reranking: iteratively updates the image ranks by circular mutual reinforcement (this paper). Note that the ﬁnal reranked list can be

picked from the best performed modality or by linearly combining lists from different modalities (details in Section IV-C).

in the initial ranked list have similar textual descriptions, but

dissimilar visual appearances in the BoW feature space. With

mutual reinforcement, the rank of these two images becomes

far apart from each other in the reranked lists produced by

text and BoW spaces. In brief, similar to visual mining, the

approach in Figure 1(b) implicitly mines the recurrent patterns

of documents through random walk; but different from existing

methods, modality interaction is considered by reinforcing the

mutual exchange and propagation of information relevancy

across different spaces.

By consolidating the idea of mutual reinforcement between

two modalities, circular reranking arranges the reinforcement

in a circular manner, as illustrated in Figure 1(c). The rein-

forcement is posted as a multi-random walk optimization

problem, where the updated scores of documents as a result

of mutual reinforcement is continuously propagated from one

modality to another. The optimization converges when the

propagation does not lead to further change of document

ordering, and ideally, results in better ranking as shown in

Figure 1(c). The preliminary version of this work, which

performs co-reranking or mutual reinforcement in pairwise

manner, is published in [30]. In this paper, we generalize

this work to multiple modalities, where interaction is explored

holistically among all the modalities, as opposed to locally

based on multiple pairwise reinforcements. Furthermore, we

address several issues arisen from this extension. These issues

involve classical problems such as the dynamic adaptation of

modality weights for information fusion. We analyze these

problems in the paper and present solution about the use of

modality importance for circular ordering of features when

more than two modalities are considered. This ordering fully

leverages the modalities of different strength for maximizing

performance of query-dependent search. In addition, extensive

empirical studies are also conducted for both image and video

search reranking.

The main contribution of this work is the proposal of

circular reranking for addressing the issue of multi-modality

interaction in visual search. This issue also leads to the elegant

view of how modalities of different strength should be lever-

aged for fusion, which is a problem not yet fully understood

in the literature. The remaining sections are organized as

follows. Section II describes the related work. Section III

presents the problem formulation of circular reranking and

its solution, while Section IV further details the ordering of

modalities in the circular layout according to their ability in

query answering. Section V presents the experimental results

for image and video reranking. Finally, Section VI concludes

this paper.

II. R

ELATED WORK

We brieﬂy group the related works for visual search rerank-

ing into two categories: recurrent pattern mining and multi-

modality fusion. The former assumes the existence of common

patterns among relevant documents for reranking. The later

predicts or learns the contribution of a modality in search

reranking.

A. Recurrent Pattern Mining

The research in this direction has proceeded along three dif-

ferent dimensions: self-reranking [6], [9], [7], crowd-reranking

剩余11页未读，继续阅读

评论收藏

内容反馈

weixin_38667207

粉丝: 3
资源: 965

视觉搜索的循环排序

冒泡排序小游戏

delphi 冒泡排序经典源码

网页点击表头自动排序

C#冒泡排序动态演示程序(看了就会)

35个热门POWERBI视觉对象

易语言源码易语言排序动画演示源码.rar

通过重写QListWidget实现拖拽列表排序。

带排序功能的Flash图片展示.rar

Javascript实现网页元素拖拽排序

快速排序演示视频.rar

视觉直观感受若干常用排序算法

嵌入式课程设计数据排序参考资料

jquery 表格排序

jxls自己写的简单实例

C# 冒泡法排序可视化程序，用不同长度条形柱表示数值，条形柱移动，表示冒泡法的实现。

turtleSource:Python乌龟图形的视觉排序算法演示

Java-Algorithm-Visual:创建用于选择，冒泡和插入排序的视觉效果

选择排序-少儿编程scratch项目源代码文件案例素材.zip

信捷视觉软件脚本手册

C语言课程中循环嵌套的微课教学设计与思考.pdf

js拖拽Table表格列排序.rar

inceptionjs创建循环嵌套的网站中网站

UNITY3D无缝循环图片墙显示

joomla图片循环播放模块

多媒体搜索引擎_华东师范大学软件学院

大疆算法工程师笔试.pdf )

顺丰科技2019 秋招视觉算法工程师笔试客观题合集

基于CORDIC的反正弦和反余弦计算的FPGA实现

最新资源