基于眼动追踪的基于内容的图像检索_眼动追踪反馈原理资源-CSDN文库

73 浏览量 2021-03-07 23:36:30 上传评论收藏 1.46MB PDF 举报

在当今大数据时代，随着互联网、计算机技术和多媒体技术的发展，人们每天上传至网络的图片数量呈爆炸性增长。为了从海量图片中迅速找到用户所需的内容，基于内容的图像检索（Content-Based Image Retrieval, CBIR）技术显得尤为重要。CBIR技术的核心是利用图像本身的视觉特征，如颜色、纹理、形状等，来检索和用户查询相似的图片。在本研究论文中，作者提出了一个新颖的基于眼动追踪数据的CBIR框架，该框架引入了基于隐式相关反馈机制的眼动追踪数据，用以提高图像检索系统的性能。提出的CBIR框架包含三个主要组成部分：特征提取和选择、视觉检索以及相关反馈。首先是特征提取和选择阶段，作者利用量子遗传算法和主成分分析算法（PCA）提取了具有70个成分的最优图像特征。量子遗传算法是一种模仿量子计算原理的优化算法，它能在大范围内搜索最优解，具有快速收敛的特性。PCA是一种常用的特征提取方法，它通过正交变换将一组可能相关的变量转换为一组线性不相关的变量，这些变量称为主成分。通过这种方式，可以去除数据中的冗余信息，保留最重要的信息。针对视觉检索阶段，作者实现了基于多类支持向量机（Multiclass SVM）和模糊C均值（FCM）算法的精细检索过程，以便检索出与用户查询最相关的图片。支持向量机是一种二分类模型，通过在特征空间中找到能最好地分割不同类别的超平面，从而实现分类。而模糊C均值算法是一种无监督学习算法，用于将数据集划分为多个模糊的簇，每簇通过一个中心点表示。为了进一步提升检索效果，作者训练了一个深度神经网络，以利用用户对返回图像相关性的反馈信息。深度神经网络能够学习到复杂的非线性关系，其在图像处理和计算机视觉领域中表现出了出色的能力。在这个过程中，相关反馈信息被用来更新检索点，以进行下一轮的检索。该论文的作者团队由来自苏州大学电子信息工程学院的周颖、王佳俊，以及香港理工大学电子与信息工程系的池哲儒组成。他们进行的实验基于Coreland和Caltech两个数据库，结果显示，通过运用他们提出的框架，CBIR的性能得到了显著提升。关键词包括CBIR（基于内容的图像检索）、眼动追踪和深度神经网络。这篇论文在2018年在波兰华沙举行的COGAIN'18: 以凝视交互为通信的工作坊上发表，并提供了ACM参考格式。本研究论文提出了一个融合眼动追踪数据的CBIR系统框架，其通过隐式相关反馈机制来提高检索性能。论文强调了图像特征提取的重要性，并利用深度学习技术来增强CBIR系统的学习能力。该系统可针对用户实际的视觉感知和兴趣点进行自我优化，实现更为智能化和个性化的图像检索服务。随着深度学习技术的不断进步和眼动追踪技术的广泛应用，基于眼动追踪的CBIR技术有望在未来的图像检索领域中发挥更为关键的作用。

资源推荐

资源详情

资源评论

Content-based Image Retrieval Based On Eye-tracking

Ying Zhou

School of Electronic and Information

Engineering

Soochow University

Suzhou, P.R.China

1621328093@qq.com

Jiajun Wang

School of Electronic and Information

Engineering

Soochow University

Suzhou, P.R.China

jjwang@suda.edu.cn

Zheru Chi

Department of Electronic and

Information Engineering

The HongKong Polytechnic

University, Hong Kong

PolyU Shenzhen Research Institute

Shenzhen, P.R.China

chi.zheru@polyu.edu.hk

ABSTRACT

To improve the performance of an image retrieval system, a novel

content-based image retrieval (CBIR) framework with eye track-

ing data based on an implicit relevance feedback mechanism is

proposed in this paper. Our proposed framework consists of three

components: feature extraction and selection, visual retrieval, and

relevance feedback. First, by using the quantum genetic algorithm

and the principle component analysis algorithm, optimal image

features with 70 components are extracted. Second, a ner retriev-

ing procedure based on multiclass support vector machine (SVM)

and fuzzy c-mean (FCM) algorithm is implemented for retrieving

most relevant images. Finally, a deep neural network is trained to

exploit the information of the user regarding the relevance of the

returned images. This information is then employed to update the

retrieving point for a new round retrieval. Experiments on two

databases (Corel and Caltech) show that the performance of CBIR

can be signicantly improved by using our proposed framework.

CCS CONCEPTS

• Information systems → Information retrieval

;

• Informa-

tion retrieval → Retrieval models and ranking

;

• Retrieval

models and ranking → Learning to rank;

KEYWORDS

CBIR, eye tracking, deep neural network

ACM Reference Format:

Ying Zhou, Jiajun Wang, and Zheru Chi. 2018. Content-based Image Re-

trieval Based On Eye-tracking. In COGAIN ’18: Workshop on Communication

by Gaze Interaction, June 14–17, 2018, Warsaw, Poland. ACM, New York, NY,

USA, 7 pages. https://doi.org/10.1145/3206343.3206353

1 INTRODUCTION

With the development of the Internet, computer and multimedia

technology, we have entered the big data era. Every day, the number

of images uploaded to the Internet is growing at an amazing speed.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

COGAIN ’18, June 14–17, 2018, Warsaw, Poland

ACM ISBN 978-1-4503-5790-6/18/06... $15.00

https://doi.org/10.1145/3206343.3206353

It remains to be a challenging problem to retrieve interested images

of users from these vast image data both eciently and accurately.

Since the early 1990s, some well-known universities in the United

States have begun their preliminary study in content based im-

age retrieval (CBIR) techniques [Carson 1999; Ma and Manjunath

1999;Wang et al

1995]. Thereafter, more and more researchers

devoted themselves in this led and some CBIR systems were de-

veloped. Typical examples of such systems include the query by

image and video content (QBIC) system [Flickner 1995] and the

multimedia information retrieval system( MIRES). In these systems,

retrieval was primarily performed by extracting image-rich visual

features such as color, texture and shape. In addition to visual fea-

tures, the Scale Invariant Feature Transform (SIFT) descriptors are

also widely used for image retrieval [Peker 2011].

However, due to the dierence between the ways of computers

and human beings in identifying objects, the retrieval systems often

return uninterested results for the users. Human beings identify

two images by accumulated experience and then summarizing the

contents of the images while the computer determines the similarity

of two images by inspecting the similarity degree of the aforemen-

tioned low level features. These two dierent ways often result

in inconsistencies. In some cases, two images are much similar in

terms of low level image features while they are obviously dier-

ent semantically. In other cases, two images are much dierent in

terms of low level features but they belong to the same category

semantically. Therefore, the returns of the low level feature based

retrieval systems often do not meet the requirement of the users

because these systems are implemented in a dierent manner from

human beings. This dierence is usually called the semantic gap.

The most direct and eective way to solve the semantic gap

problem is to develop a closed loop system which permits the

involvement of users upon employing a relevance feedback (RF)

[Qian et al

2016;Zhang et al

2016] mechanism. The key issue of

such a mechanism is the way to collect the feedback data of the

users regarding the relevance extent of the returns to the query

image. The data can either be collected explicitly through a mouse

or a keyboard or implicitly through the analysis of the data where

the relevance information can be mined. Typical data for such a

purpose include the electroencephalograph (EEG) data [Wang et al

2015] and the eye tracking data [Pasupa et al

2011]. For implicit

feedback, it is of critical importance to classify the patterns of the

EEG data or the eye tracking data so that the judgement of the user

regarding the relevance extent can be obtained.

COGAIN ’18, June 14–17, 2018, Warsaw, Poland Y. Zhou et al.

In this paper, we propose to alleviate the semantic gap problem

in the framework of implicit relevance feedback using eye track-

ing data. In this framework, the retrieval is still performed based

on low level visual features. Due to its superiority in extracting

abstract features, the convolutional neural network is employed

to classify the eye tracking data to the patterns of being relevant

or irrelevant. In our work, an optimal subset of low level visual

features with 70 components are rst extracted and selected with

a feature selection method proposed in [Zhou et al

2017]. These

subset of visual features are then used for our image retrieval.

2 RELATED WORK

Relevance Feedback (RF) is an interactive and/or supervised learn-

ing process used to improve the performance of information re-

trieval systems [Rui et al

1998]. Explicit RF methods initially con-

sidered global-level feedback information. In [Rahman et al

2007],

a two-level strategy was proposed for image retrieval. In the lower

level, the supervised support vector machine method and the un-

supervised fuzzy c-mean clustering technique were rst combined

to prelter images in a principle component analysis (PCA)-based

eigenspace while in the ner level, images were retrieved with

a category-based statistical similarity matching technique. Upon

employing an explicit RF scheme, the user’s semantic perception

was incorporated to adjust the similarity matching function. More

recently, the analysis of explicit RF methods has shifted to a ner

level of detail and region-based RF approaches. Methods of this kind

still receive feedback information at global-level but the estimate

of the relevant local-level objects is used to rene the retrieved

results. For example, an image retrieval framework that is based on

a graph-theoretic region correspondence estimation was presented

in [Li and Hsu 2008]. Another SVM-based approach was proposed

in [Djordjevic and Izquierdo 2007] which utilizes an adaptive convo-

lution kernel to realize object-based indexing and images retrieval.

The main drawbacks of these methods are their inaccurate relevant

region identication and requiring extensive eort from the user

to provide feedback in every iteration.

Over the past few years, implicit RF has received particular at-

tention in the image retrieval community. Both advantages and dis-

advantages exist for this new technology. The advantages include

its non-intrusive time-ecient capturing of the user’s feedback and

its being more expressive than the explicitly provided feedback.

On the other hand, the drawback of this technique is the presence

of large amounts of noise in the feedback data. Among the dier-

ent types of implicit feedback data, eye tracking is of particular

importance to image retrieval applications, since it can provide

valuable information with respect to which parts of the image the

user has observed as well as cues regarding the relevance of the

latter to the query at hand. Most eye tracking methods related to

image retrieval have focused on predicting the user’s relevance

assessment at the image level. Maiorana [Maiorana 2013] proposed

an enhanced CBIR system where an eye tracking data based rele-

vance feedback component was integrated. In this system, the gaze

points were used to infer regions of interest in the query image

thus allowing for searches based on global or local features. Gaze

features such as the total xation time and the number of xations

were used to weight the relevance of the returned images. Two

images with the highest scores were then used as new queries for a

new round of retrieval. In [Liang et al

2010], a region based image

retrieval method was proposed where the eye tracking data were

used to determine the region of interests whose importance was

weighted with the xation time. With these importance weights

of image regions, contributions to image distance from dierent

image regions were updated.

The fundamental issue of the implicit RF based on the eye track-

ing data is how to discover the user’s relevance assessment of

returned images. In most existing work, gaze features such as times

an image was visited, total time the user spent on image, number

of xations, total xation time were used for discovering user’s

relevance assessment [Papadopoulos et al

2014]. Actually, the task

of user’s relevance assessment discovery is just a two-class pattern

analysis problem, i.e., to partition the patterns of the eye track-

ing data to relevance or irrelevance classes. Many recent research

results showed that deep learning (DL) performs excellently espe-

cially in pattern classication and hence attracted more and more

attention. LeCun rst proposed a set of network models called

LeNet to identify handwritten numbers [Lécun et al

1998]. In 2012,

Hinton and Krizhevsky [Krizhevsky et al

2012] used the convolu-

tion neural network to classify images in ImageNet. Zhang [Zhang

et al

Zhang et al

] proposed an image retrieval algorithm that used

the characteristics of the CNN fully connected layer and the group

cross-index. For better discovering the user’s retrieval intention

and considering the strong power in pattern classication of the

deep learning technique, the deep neural network is employed to

classify the eye tracking data to the patterns of being relevant or

irrelevant, which is then used to update the retrieval strategy. Ex-

periment results show that our method can achieve comparable

results from the explicit RF method and outperforms the results

from the implicit RF where the xation time was used to re-weight

the importance of dierent regions [Liang et al. 2010].

Figure 1: Block diagram of our image retrieval system

3 METHOD

3.1 System Overview

The block diagram of our proposed image retrieval system is shown

in Fig.1. There are two sub-blocks in this system with one being the

RF training block and the other one being the RF based retrieval

block.

剩余6页未读，继续阅读

评论收藏

内容反馈

weixin_38686231

粉丝: 10
资源: 917

基于眼动追踪的基于内容的图像检索

基于内容的图像检索

基于眼动追踪的相关性反馈用于迭代人脸图像检索

两篇基于sift特征的图像检索论文

医学图像检索

基于mean shift图像检索matlab代码

CNN图像检索

基于内容图像搜索引擎.ppt

人工智能-项目实践-信息检索-使用深度学习网络（目标检测/特征提取/特征匹配）建立的图像精准检索系统

基于眼动跟踪的智能家居控制器.pdf

基于python和pyqt5，实现opencv图像处理，包含内容有基本的图像处理，人脸检测和识别，图像检索等

基十纹理和形状特征的图像检索技术研究

基于内容图像检索

基于内容的图像检索专题

基于图像处理的智能交通系统

基于matlab实现25个图像与视频处理实用案例（含可运行程序）

基于改进PSO算法的最大熵阈值图像分割

基于matlab实现的详细讲解了25个MATLAB图像与视频处理实用案例

图像分析的图像检索（彩色图像）【英文版】

最新资源