Rankcosine-LearningtoSearchWebPageswithQuery-LevelLossFunctions.pdf资源-CSDN文库

需积分: 16 73 浏览量 2008-03-11 10:36:20 上传评论收藏 426KB PDF 举报

资源推荐

资源详情

资源评论

Learning to Search Web Pages with Query-Level Loss

Functions

Tao QIN

, Tie-Yan LIU

, Ming-Feng Tsai

, Xu-Dong ZHANG

, Hang Li

Microsoft Research Asia, No.49 Zhichun Road, Haidian District, Beijing 100080, P.R. China

Dept. Electronic Engineering, Tsinghua University, Beijing, 100084, P.R. China

Dept. Computer Science and Information Engineering, National Taiwan University, Taiwan 106, ROC

{tyliu, hangli}@microsoft.com

{qinshitao99@mails., zhangxd@}tsinghua.edu.cn

mftsai@nlg.csie.ntu.edu.tw

Abstract

With the rapid development of information retrieval and Web

search, ranking has become a new branch of supervised learning.

Many existing machine learning technologies such as Support

Vector Machines, Boosting, and Neural Networks have been

applied to this new problem and achieved some success. However,

since these algorithms are not proposed initially for information

retrieval, their loss functions are not quite in accordance with

widely-used evaluation criteria for information retrieval. Such

criteria include mean average precision (MAP), mean precision at

n (P@n), and normalized discounted cumulative gain (NDCG).

While these criteria are averaged on the query-level, loss

functions of the aforementioned machine learning algorithms are

defined on the level of documents or document pairs. Therefore, it

is not clear whether minimizing these loss functions can

effectively improve the retrieval performance. To solve this

problem, we propose to use query-level loss functions when

learning the ranking function. In particular, we discuss what kind

of properties a good query-level loss function should have, and

propose a query-level loss function based on the cosine similarity

between a ranking list and the corresponding ground truth with

respect to a given query. In the meanwhile, we also discuss

whether loss functions of existing ranking algorithms can be

upgraded to the query level. Experiments on the datasets for

TREC web track, OSHUMED, and a large collection of web

pages from a commercial Web search engine all showed that our

proposed query-level loss function can improve the search

accuracy significantly, and it is non-trivial to upgrade traditional

document-level loss functions to the query level.

Keywords

Information Retrieval, Learning to Rank, Query-level Loss

Function, RankCosine

1. Introduction

While search engines have been changing people‘s life, how to

improve the accuracy of Web search has attracted more and more

research attention. Classical approaches [1] use empirical methods

to rank web pages, such as BM25[20]-like content based methods

and PageRank[18]-like link based methods. However, there are

many factors that can impact the ranking results, such as link

information, query word distribution in different sections of the

Web page, and many others. In this case, empirically tuning

model parameters becomes very difficult. To tackle this problem,

machine learning methods have been applied to combining these

factors in recent years. Typical work includes RankBoost [8],

RankSVM [12][16], and RankNet [3], which are based on

Boosting, Support Vector Machines and Neural Networks

respectively. To some extent, automatically learning from ground

truth labels to generate a sorted list of documents according to

their relevance to the given query has become a new branch of

supervised learning, in addition to classification, regression and

density estimation [26].

However, it should be noted that the aforementioned machine

learning algorithms were not initially proposed for information

retrieval (IR), and therefore their loss functions are not quite in

accordance with widely-used evaluation criteria for IR. The IR

evaluation criteria mainly include mean average precision (MAP)

[1], mean precision at n (P@n) [1], and normalized discounted

cumulative gain (NDCG) [13][14]. All these criteria are averaged

on the query level: no matter how different the numbers of

documents retrieved for two queries are, they contribute equally to

the final performance evaluation of the ranking algorithm.

However, as we know, loss functions of the aforementioned

machine learning algorithms are defined on the level of document

[17] or document pairs [3][8][12][16]. Therefore, it is not clear

whether minimizing these loss functions can effectively improve

the retrieval performance, since those queries with more

documents or document pairs may bias the training process.

In order to solve this problem, we propose to use query-level loss

functions when learning the ranking function for IR. In particular,

we first discuss what kind of properties a good query-level loss

function should have, and then propose a query-level loss

function as an example, which is based on the cosine similarity

between a ranking list and the corresponding ground truth with

respect to a given query. With this new loss function, we derive an

algorithm based on generalized additive model to learn the

ranking function. Moreover, we also discuss whether the

document-level and pair-wise loss functions of the

aforementioned ranking algorithms can be reasonably upgraded to

the query level, and how they may perform after that. We used

two public datasets and a commercial dataset to evaluate our

methods. Experimental results show that the proposed query-level

loss is very effective for information retrieval, while it is in

general non-trivial to upgrade existing document-level loss

functions to the query level.

In summary, contributions of our work are as below:

1) We point out that learning to rank for IR should consider

query-level loss functions;

2) We enumerate fundamental properties that a good query-

level loss function should have;

3) We provide an example for query-level loss functions and

demonstrate how to derive an efficient algorithm to minimize

it;

4) We discuss how to upgrade loss functions of previous

algorithms (e.g. RankSVM, RankBoost, and RankNet) to the

query level, and investigate their performance empirically.

The following part of this paper is organized as follows. In

Section 2, we briefly review some related works about machine

learning algorithms for ranking. In Section 3, we justify the

necessity of using query-level loss function for IR and discuss

some properties that a good query-level loss function should have.

We then give an example, the cosine loss, for query-level loss

functions and derive an efficient algorithm to minimize the cosine

loss function in Section 4. Experiment settings are described in

Section 5, and results are reported in Section 6 to 8. In Section 9,

we discuss how to upgrade loss functions of some previous

algorithms to the query level, and limitations of the cosine loss.

Conclusions are given in the last section.

2. Related works

As mentioned in the introduction, many machine learning

technologies [3][5][7][8][12][16][17] were applied to the

problem of ranking for information retrieval in recent years. Some

early works simply tackled this problem from a binary

classification point of view [17]: they directly judge whether a

document is relevant to the query or not, and rank relevant

documents higher than irrelevant documents. However, in real-

world scenarios, the relevance of a document to a query can

hardly be simply classified into ―relevant‖ or ―irrelevant‖.

Actually, some documents may be highly relevant, some are

partially relevant, and others are definitely irrelevant. In a word,

binary classification can not capture the multi-grade nature of

relevance judgment for IR. To solve this problem, many other

works were proposed such as RankSVM, RankBoost, and

RankNet.

Thorsten Joachims [16] took an SVM approach to learn the

ranking function and proposed the RankSVM algorithm. The

basic idea of RankSVM is the same as general SVM: minimizing

the sum of structure loss (which can be expressed by L-2 norm of

the hyper plane parameter ω) and empirical loss. The difference is

that constraints in RankSVM are partial order preferences for

document pairs. The optimization formulation of RankSVM is

shown as follows

1 1 1 , ,1

min: ( , )

. .: ( , ) : ( , ) ( , ) 1

( , ) : ( , ) ( , ) 1

i j q

i j i j i j

i j n n i n j i j n

s t d d r q d q d

d d r q d q d

  

  



   





(1)

Where C is a parameter which controls the trade-off between

structure loss and empirical loss, φ(q,d

) is the feature vector

calculated from document d

with respect to query q, and the

constraint ωφ(q,d

)> ωφ(q,d

) means that user prefers document d

to d

for query q. From the theoretical aspect, RankSVM is well-

founded in the framework of structure risk minimization, and

various kinds of experiments verified its effectiveness. Note that

such pair-wise strategy can not only handle binary relevance

judgment, but also multi-grade relevance judgment. There is some

following work [4] of RankSVM .

Freund, Y. et al [8] adopted the Boosting approach and proposed

the RankBoost algorithm for ranking. Similar to RankSVM,

RankBoost operates on document pairs. Suppose d

means that

document d

should be ranked higher than d

for query q. Consider

the model f, where f(φ(q,d

))>f(φ(q,d

)) means that the model

asserts d

. Then the loss for a document pair in RankBoost is

defined as

 

( , ) ( , )

()

f q d f q d

i q j

L d d e







(2)

Consequently, the loss function of RankBoost is the sum of losses

for all document pairs:

()

i q j

q d d

L L d d





(3)

One advantage of RankBoost is that it is easy to implement and is

feasible for parallel computation. Also the ranking performance is

promising due to empirical studies.

Neural networks have also been applied to ranking in recent years.

With a probabilistic cost function, C.J.C. Burges [3] proposed the

RankNet algorithm, in which neural networks are used to learn the

underlying ranking function. Same as RankSVM and RankBoost,

training samples of RankNet are also a set of document pairs.

Denote the modeled posterior P(d

) by P

, and let

be the

desired target values of those posteriors, and o

q,i,j

=f(φ (q,d

))-

f(φ(q,d

))

. Then the loss for a document pair in RankNet can be

represented as follows.

, , , ,

( ) log (1 )log(1 )

log(1 )

q i j

q i j q i j ij ij ij ij

ij q i j

L L o P P P P

P o e

    

  

(4)

Similarly, the total loss of RankNet is the sum of all pair-wise

losses

For details, please refer to [16].

The definitions of f and φ are inherent from the discussions on RankSVM

and RankBoost.

q i j

LL



(5)

RankNet is very successful and has been integrated into a

commercial Web search engine [29].

After introducing RankSVM, RankBoost and RankNet, we have

to point out that they still have some problems although their

successes have been demonstrated in the literature. One major

problem is that the loss functions used in these algorithms are not

in accordance with the IR evaluation. We will elaborate on this in

more detail in the next section.

3. Query-level loss functions for information

retrieval

Before introducing query-level loss function, we use Table 1 to

summarize the loss functions in existing algorithms described in

Section 2. In the classification approach [17], loss function is

defined on the document level, which only considers a single

document and does not consider the relationship of relative

relevancy among documents in the ranking problem. The loss

functions of RankSVM, RankBoost, and RankNet are all defined

on the document-pair level. They can model the relationship of

partial-order preference among documents. However, they still

cannot model the total-order preference for all the documents for

a particular query. In this regard, these loss functions are not yet

in accordance with the evaluation criteria for IR such as MAP and

NDCG which are all averaged on the query level. This motivated

us to propose the concept of query-level loss function, which will

be discussed in following sub sections.

Table 1 Loss functions for web search

Loss

function

Document-

level

Pair-wise

Query-

level

Algorithms

Binary

classification

[17]

RankSVM[4][12][16

]

RankBoost[8]

RankNet[3]

Our work

3.1 Why query-level loss for IR

As mentioned for many times, loss functions in RankSVM,

RankBoost and RankNet are not in accordance with the widely-

used evaluation criteria for IR. Actually, this may seriously affect

the effectiveness of learning algorithms.

Consider a simple example. Suppose there are two queries q

and

with 40 and 5 training documents respectively. For the extreme

case with complete partial-order document pairs for training, we

could get 40*39/2=780 pairs for q

and only 10 pairs for q

. If a

learning algorithm can rank all pairs correctly, there would be no

problem. However, if not, for example, we can only rank 780 out

of the total 790 pairs correctly, problems will come into being.

With the pair-wisel loss function, losses will be the same if the

learning algorithm correctly ranks all pairs of q

and 770 pairs of

, or correctly ranks all pairs of q

and no pair of q

. However,

for these two cases, the resultant performance with query-level

evaluation criterion will be quite different. As clearly shown in

Table 2, case 1 is much better than case 2. This example tells us

that using a document-level loss function is not suitable for IR.

Actually only if all queries have the same number of document

pairs for training, the document-level loss function and the query-

level loss can lead to the same result. However, it is almost

impossible in real-world scenarios. Therefore it will be better to

define the loss function on the query level while learning ranking

functions.

Table 2 Document-pair loss v.s. query-level loss

Case 1

Case 2

Document pairs

of q

correctly ranked

770

780

wrongly ranked

Accuracy

98.72%

100%

Document pairs

of q

correctly ranked

wrongly ranked

Accuracy

100%

overall accuracy

Document-pair level

98.73%

query level

99.36%

50%

3.2 What is a good loss function for IR

Then a natural question is what properties a good query-level loss

function should have. Here we list some properties as follows, and

discuss why they are important for learning based information

retrieval.

1) Insensitive to the number of document and document pairs

per query.

This is clear, if considering the example shown in Table 2. In this

regard, loss functions of RankSVM, RankBoost and RankNet are

not good query-level loss functions.

2) Emphasize top-ranked documents in the rank list

Considering the gap between the explosion of information and the

user‘s capability of absorbing this information, precision becomes

much more important than recall for applications of information

retrieval. As a result, many IR evaluation criteria emphasize the

error rate for top-ranked documents. For example, a ranking

mistake for top-5 documents is more serious than a ranking

mistake in the 95th document in the retrieved document list.

Accordingly, a good loss function should also reflect this property.

Actually, loss functions of RankSVM, RankBoost and RankNet

can be more or less regarded as emphasizing top-ranked

documents. The reason is that if a definitely irrelevant document

is ranked high, it will disobey many partial-order constraints,

while if it is ranked in the middle of the list, the number of

constraints that it disobeys will be much smaller.

3) Have a finite upper bound

Query-level loss function should not be easily biased by some

difficult queries. For this purpose, a natural requirement is that the

loss for each query should have an upper bound. Otherwise, some

specific query with extremely large loss will dominate the training

process.

We find that document-level loss functions of RankBoost,

RankSVM and RankNet do not have an upper bound. For

RankBoost, because the exponential loss is used, it is easy to

understand that in some cases, the corresponding value of the loss

function can be extremely large. We can get the similar

conclusion for the hinge loss of RankSVM. For RankNet,

剩余11页未读，继续阅读

评论收藏

内容反馈

xlliu0226

粉丝: 8
资源: 20

Rankcosine- Learning to Search Web Pages with Query-Level Loss F...

最新资源

Rankcosine- Learning to Search Web Pages with Query-Level Loss F...

Deep-Time-Series-Forecasting-with-Python-An-Intuitive-Introduction-to-Deep-Learning-for-Applied-Time-Series-Modeling.pdf.pdf

Hands-on-Machine-Learning-with-Scikit-2E.pdf

Designing-Machine-Learning-Systems-with-Python.pdf.pdf

Practical-Machine-Learning-with-Python-A-Problem-Solver-s-Guide-to-Building-Real-World-Intelligent-Systems.pdf.pdf

Learning-Selenium-Testing-Tools-with-Python-A-practical-guide-on-automated-web-testing-with-Selenium-using-Python.pdf.pdf

Building-Machine-Learning-Systems-with-Python.pdf.pdf

Deep-Learning-with-TensorFlow-Explore-neural-networks-with-Python.pdf.pdf

Large-Scale-Machine-Learning-with-Python.pdf.pdf

Pro-Deep-Learning-with-TensorFlow-A-Mathematical-Approach-to-Advanced-Artificial-Intelligence-in-Python.pdf.pdf

Pro-Deep-Learning-with-TensorFlow.pdf

Deep-Learning-with-Python-A-Hands-on-Introduction.pdf.pdf

Python库 | pixel-level-contrastive-learning-0.0.1.tar.gz

Learning-Penetration-Testing-with-Python.pdf.pdf

DQ深度学习Deep Reinforcement Learning with Double Q-Learning.pdf

halcon-17.12.0.0-windows-images-deep-learning.exe

MACHINE-LEARNING-WITH-THE-I.MX-8M.pdf

cheatsheet-supervised-learning.pdf

Learning-Data-Mining-with-Python.pdf.pdf

微信小程序源码-合集6.rar

微信小程序源码-合集5.rar

微信小程序源码-合集4.rar

微信小程序源码-合集3.rar

微信小程序源码-合集2.rar

浏览器插件 Auto Refresh Plus 7.4.4 ctx

品优购项目 素材及代码

学生宿舍管理系统源码文件

全能电子地图下载器1.9.5完善版.zip

大学生网页设计大作业-5个网页设计制作作品自己任选

数据可视化大屏资料合集（网上收集的几十种样式的html静态页）

最新资源

品优购项目素材及代码