没有合适的资源?快使用搜索试试~ 我知道了~
Rankcosine- Learning to Search Web Pages with Query-Level Loss F...
需积分: 16 28 下载量 73 浏览量
2008-03-11
10:36:20
上传
评论
收藏 426KB PDF 举报
温馨提示
试读
12页
Rankcosine- Learning to Search Web Pages with Query-Level Loss Functions.pdf
资源推荐
资源详情
资源评论
Learning to Search Web Pages with Query-Level Loss
Functions
Tao QIN
2
, Tie-Yan LIU
1
, Ming-Feng Tsai
3
, Xu-Dong ZHANG
2
, Hang Li
1
1
Microsoft Research Asia, No.49 Zhichun Road, Haidian District, Beijing 100080, P.R. China
2
Dept. Electronic Engineering, Tsinghua University, Beijing, 100084, P.R. China
3
Dept. Computer Science and Information Engineering, National Taiwan University, Taiwan 106, ROC
1
{tyliu, hangli}@microsoft.com
2
{qinshitao99@mails., zhangxd@}tsinghua.edu.cn
3
mftsai@nlg.csie.ntu.edu.tw
Abstract
With the rapid development of information retrieval and Web
search, ranking has become a new branch of supervised learning.
Many existing machine learning technologies such as Support
Vector Machines, Boosting, and Neural Networks have been
applied to this new problem and achieved some success. However,
since these algorithms are not proposed initially for information
retrieval, their loss functions are not quite in accordance with
widely-used evaluation criteria for information retrieval. Such
criteria include mean average precision (MAP), mean precision at
n (P@n), and normalized discounted cumulative gain (NDCG).
While these criteria are averaged on the query-level, loss
functions of the aforementioned machine learning algorithms are
defined on the level of documents or document pairs. Therefore, it
is not clear whether minimizing these loss functions can
effectively improve the retrieval performance. To solve this
problem, we propose to use query-level loss functions when
learning the ranking function. In particular, we discuss what kind
of properties a good query-level loss function should have, and
propose a query-level loss function based on the cosine similarity
between a ranking list and the corresponding ground truth with
respect to a given query. In the meanwhile, we also discuss
whether loss functions of existing ranking algorithms can be
upgraded to the query level. Experiments on the datasets for
TREC web track, OSHUMED, and a large collection of web
pages from a commercial Web search engine all showed that our
proposed query-level loss function can improve the search
accuracy significantly, and it is non-trivial to upgrade traditional
document-level loss functions to the query level.
Keywords
Information Retrieval, Learning to Rank, Query-level Loss
Function, RankCosine
1. Introduction
While search engines have been changing people‘s life, how to
improve the accuracy of Web search has attracted more and more
research attention. Classical approaches [1] use empirical methods
to rank web pages, such as BM25[20]-like content based methods
and PageRank[18]-like link based methods. However, there are
many factors that can impact the ranking results, such as link
information, query word distribution in different sections of the
Web page, and many others. In this case, empirically tuning
model parameters becomes very difficult. To tackle this problem,
machine learning methods have been applied to combining these
factors in recent years. Typical work includes RankBoost [8],
RankSVM [12][16], and RankNet [3], which are based on
Boosting, Support Vector Machines and Neural Networks
respectively. To some extent, automatically learning from ground
truth labels to generate a sorted list of documents according to
their relevance to the given query has become a new branch of
supervised learning, in addition to classification, regression and
density estimation [26].
However, it should be noted that the aforementioned machine
learning algorithms were not initially proposed for information
retrieval (IR), and therefore their loss functions are not quite in
accordance with widely-used evaluation criteria for IR. The IR
evaluation criteria mainly include mean average precision (MAP)
[1], mean precision at n (P@n) [1], and normalized discounted
cumulative gain (NDCG) [13][14]. All these criteria are averaged
on the query level: no matter how different the numbers of
documents retrieved for two queries are, they contribute equally to
the final performance evaluation of the ranking algorithm.
However, as we know, loss functions of the aforementioned
machine learning algorithms are defined on the level of document
[17] or document pairs [3][8][12][16]. Therefore, it is not clear
whether minimizing these loss functions can effectively improve
the retrieval performance, since those queries with more
documents or document pairs may bias the training process.
In order to solve this problem, we propose to use query-level loss
functions when learning the ranking function for IR. In particular,
we first discuss what kind of properties a good query-level loss
function should have, and then propose a query-level loss
function as an example, which is based on the cosine similarity
between a ranking list and the corresponding ground truth with
respect to a given query. With this new loss function, we derive an
algorithm based on generalized additive model to learn the
ranking function. Moreover, we also discuss whether the
document-level and pair-wise loss functions of the
aforementioned ranking algorithms can be reasonably upgraded to
the query level, and how they may perform after that. We used
two public datasets and a commercial dataset to evaluate our
methods. Experimental results show that the proposed query-level
loss is very effective for information retrieval, while it is in
general non-trivial to upgrade existing document-level loss
functions to the query level.
In summary, contributions of our work are as below:
1) We point out that learning to rank for IR should consider
query-level loss functions;
2) We enumerate fundamental properties that a good query-
level loss function should have;
3) We provide an example for query-level loss functions and
demonstrate how to derive an efficient algorithm to minimize
it;
4) We discuss how to upgrade loss functions of previous
algorithms (e.g. RankSVM, RankBoost, and RankNet) to the
query level, and investigate their performance empirically.
The following part of this paper is organized as follows. In
Section 2, we briefly review some related works about machine
learning algorithms for ranking. In Section 3, we justify the
necessity of using query-level loss function for IR and discuss
some properties that a good query-level loss function should have.
We then give an example, the cosine loss, for query-level loss
functions and derive an efficient algorithm to minimize the cosine
loss function in Section 4. Experiment settings are described in
Section 5, and results are reported in Section 6 to 8. In Section 9,
we discuss how to upgrade loss functions of some previous
algorithms to the query level, and limitations of the cosine loss.
Conclusions are given in the last section.
2. Related works
As mentioned in the introduction, many machine learning
technologies [3][5][7][8][12][16][17] were applied to the
problem of ranking for information retrieval in recent years. Some
early works simply tackled this problem from a binary
classification point of view [17]: they directly judge whether a
document is relevant to the query or not, and rank relevant
documents higher than irrelevant documents. However, in real-
world scenarios, the relevance of a document to a query can
hardly be simply classified into ―relevant‖ or ―irrelevant‖.
Actually, some documents may be highly relevant, some are
partially relevant, and others are definitely irrelevant. In a word,
binary classification can not capture the multi-grade nature of
relevance judgment for IR. To solve this problem, many other
works were proposed such as RankSVM, RankBoost, and
RankNet.
Thorsten Joachims [16] took an SVM approach to learn the
ranking function and proposed the RankSVM algorithm. The
basic idea of RankSVM is the same as general SVM: minimizing
the sum of structure loss (which can be expressed by L-2 norm of
the hyper plane parameter ω) and empirical loss. The difference is
that constraints in RankSVM are partial order preferences for
document pairs. The optimization formulation of RankSVM is
shown as follows
1
:
,,
,,
*
1 1 1 , ,1
*
,,
1
min: ( , )
2
. .: ( , ) : ( , ) ( , ) 1
( , ) : ( , ) ( , ) 1
T
i j q
i j q
i j i j i j
i j n n i n j i j n
VC
s t d d r q d q d
d d r q d q d
(1)
Where C is a parameter which controls the trade-off between
structure loss and empirical loss, φ(q,d
i
) is the feature vector
calculated from document d
i
with respect to query q, and the
constraint ωφ(q,d
i
)> ωφ(q,d
j
) means that user prefers document d
i
to d
j
for query q. From the theoretical aspect, RankSVM is well-
founded in the framework of structure risk minimization, and
various kinds of experiments verified its effectiveness. Note that
such pair-wise strategy can not only handle binary relevance
judgment, but also multi-grade relevance judgment. There is some
following work [4] of RankSVM .
Freund, Y. et al [8] adopted the Boosting approach and proposed
the RankBoost algorithm for ranking. Similar to RankSVM,
RankBoost operates on document pairs. Suppose d
i
»
q
d
j
means that
document d
i
should be ranked higher than d
j
for query q. Consider
the model f, where f(φ(q,d
i
))>f(φ(q,d
j
)) means that the model
asserts d
i
»
q
d
j
. Then the loss for a document pair in RankBoost is
defined as
( , ) ( , )
()
ij
f q d f q d
i q j
L d d e
(2)
Consequently, the loss function of RankBoost is the sum of losses
for all document pairs:
()
i q j
i q j
q d d
L L d d
(3)
One advantage of RankBoost is that it is easy to implement and is
feasible for parallel computation. Also the ranking performance is
promising due to empirical studies.
Neural networks have also been applied to ranking in recent years.
With a probabilistic cost function, C.J.C. Burges [3] proposed the
RankNet algorithm, in which neural networks are used to learn the
underlying ranking function. Same as RankSVM and RankBoost,
training samples of RankNet are also a set of document pairs.
Denote the modeled posterior P(d
i
»
q
d
j
) by P
ij
, and let
ij
P
be the
desired target values of those posteriors, and o
q,i,j
=f(φ (q,d
i
))-
f(φ(q,d
j
))
2
. Then the loss for a document pair in RankNet can be
represented as follows.
,,
, , , ,
,,
( ) log (1 )log(1 )
log(1 )
q i j
q i j q i j ij ij ij ij
o
ij q i j
L L o P P P P
P o e
(4)
Similarly, the total loss of RankNet is the sum of all pair-wise
losses
1
For details, please refer to [16].
2
The definitions of f and φ are inherent from the discussions on RankSVM
and RankBoost.
,,
,
q i j
q i j
LL
(5)
RankNet is very successful and has been integrated into a
commercial Web search engine [29].
After introducing RankSVM, RankBoost and RankNet, we have
to point out that they still have some problems although their
successes have been demonstrated in the literature. One major
problem is that the loss functions used in these algorithms are not
in accordance with the IR evaluation. We will elaborate on this in
more detail in the next section.
3. Query-level loss functions for information
retrieval
Before introducing query-level loss function, we use Table 1 to
summarize the loss functions in existing algorithms described in
Section 2. In the classification approach [17], loss function is
defined on the document level, which only considers a single
document and does not consider the relationship of relative
relevancy among documents in the ranking problem. The loss
functions of RankSVM, RankBoost, and RankNet are all defined
on the document-pair level. They can model the relationship of
partial-order preference among documents. However, they still
cannot model the total-order preference for all the documents for
a particular query. In this regard, these loss functions are not yet
in accordance with the evaluation criteria for IR such as MAP and
NDCG which are all averaged on the query level. This motivated
us to propose the concept of query-level loss function, which will
be discussed in following sub sections.
Table 1 Loss functions for web search
Loss
function
Document-
level
Pair-wise
Query-
level
Algorithms
Binary
classification
[17]
RankSVM[4][12][16
]
RankBoost[8]
RankNet[3]
Our work
3.1 Why query-level loss for IR
As mentioned for many times, loss functions in RankSVM,
RankBoost and RankNet are not in accordance with the widely-
used evaluation criteria for IR. Actually, this may seriously affect
the effectiveness of learning algorithms.
Consider a simple example. Suppose there are two queries q
1
and
q
2
with 40 and 5 training documents respectively. For the extreme
case with complete partial-order document pairs for training, we
could get 40*39/2=780 pairs for q
1
and only 10 pairs for q
2
. If a
learning algorithm can rank all pairs correctly, there would be no
problem. However, if not, for example, we can only rank 780 out
of the total 790 pairs correctly, problems will come into being.
With the pair-wisel loss function, losses will be the same if the
learning algorithm correctly ranks all pairs of q
2
and 770 pairs of
q
1
, or correctly ranks all pairs of q
1
and no pair of q
2
. However,
for these two cases, the resultant performance with query-level
evaluation criterion will be quite different. As clearly shown in
Table 2, case 1 is much better than case 2. This example tells us
that using a document-level loss function is not suitable for IR.
Actually only if all queries have the same number of document
pairs for training, the document-level loss function and the query-
level loss can lead to the same result. However, it is almost
impossible in real-world scenarios. Therefore it will be better to
define the loss function on the query level while learning ranking
functions.
Table 2 Document-pair loss v.s. query-level loss
Case 1
Case 2
Document pairs
of q
1
correctly ranked
770
780
wrongly ranked
10
0
Accuracy
98.72%
100%
Document pairs
of q
2
correctly ranked
10
0
wrongly ranked
0
10
Accuracy
100%
0%
overall accuracy
Document-pair level
98.73%
98.73%
query level
99.36%
50%
3.2 What is a good loss function for IR
Then a natural question is what properties a good query-level loss
function should have. Here we list some properties as follows, and
discuss why they are important for learning based information
retrieval.
1) Insensitive to the number of document and document pairs
per query.
This is clear, if considering the example shown in Table 2. In this
regard, loss functions of RankSVM, RankBoost and RankNet are
not good query-level loss functions.
2) Emphasize top-ranked documents in the rank list
Considering the gap between the explosion of information and the
user‘s capability of absorbing this information, precision becomes
much more important than recall for applications of information
retrieval. As a result, many IR evaluation criteria emphasize the
error rate for top-ranked documents. For example, a ranking
mistake for top-5 documents is more serious than a ranking
mistake in the 95th document in the retrieved document list.
Accordingly, a good loss function should also reflect this property.
Actually, loss functions of RankSVM, RankBoost and RankNet
can be more or less regarded as emphasizing top-ranked
documents. The reason is that if a definitely irrelevant document
is ranked high, it will disobey many partial-order constraints,
while if it is ranked in the middle of the list, the number of
constraints that it disobeys will be much smaller.
3) Have a finite upper bound
Query-level loss function should not be easily biased by some
difficult queries. For this purpose, a natural requirement is that the
loss for each query should have an upper bound. Otherwise, some
specific query with extremely large loss will dominate the training
process.
We find that document-level loss functions of RankBoost,
RankSVM and RankNet do not have an upper bound. For
RankBoost, because the exponential loss is used, it is easy to
understand that in some cases, the corresponding value of the loss
function can be extremely large. We can get the similar
conclusion for the hinge loss of RankSVM. For RankNet,
剩余11页未读,继续阅读
资源评论
xlliu0226
- 粉丝: 8
- 资源: 20
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功