Rank-biased overlap
===================
Overview
--------
A small Python module for calculating **rank-biased overlap**, a measure of
similarity between ragged, possibly infinite ranked lists which may or may not
contain the same items (up to the actually evaluated depth or at all). See "A
similarity measure for indefinite rankings" by W. Webber, A. Moffat and J. Zobel
(2011), <http://dx.doi.org/10.1145/1852102.1852106>.
The definition of overlap has been modified to account for ties. Without this,
results for lists with tied items were being inflated. The modification itself
is not mentioned in the paper but seems to be reasonable, see function
`overlap()`. Places in the code which diverge from the spec in the paper
because of this are highlighted with comments (search for "NOTE").
The functions intended for external use are `rbo()` and `rbo_dict()`, plus
possibly `average_overlap()` (for comparison purposes). `rbo()` receives two
sorted lists where each individual item is a hashable object or a set of
hashable objects (to represent ties):
```python
>>> rbo([{"a", "c"}, "b", "d"], ["a", {"b", "c"}, "d"], p=.9)
RBO(min=0.48919503099801515, res=0.47747163566865164, ext=0.9666666666666667)
```
The function returns a `namedtuple` with three fields whose values correspond
to three RBO estimates (all defined in the paper):
- `min` is a lower-bound estimate
- `res` is the corresponding *residual*; `min` + `res` constitutes an upper
bound estimate
- `ext` is an *ext*rapolated point estimate
By contrast, `rbo_dict` takes a dict mapping the items to sort to the scores
according to which they should be sorted:
```python
>>> rbo_dict(dict(a=1, b=2, c=1, d=3), dict(a=1, b=2, c=2, d=3), p=.9, sort_ascending=True)
RBO(min=0.48919503099801515, res=0.47747163566865164, ext=0.9666666666666667)
```
Scores are typically the higher the better, so the sort is descending by
default. You can specify `sort_ascending=True` to override this if you have
some rank-like score (i.e. the lower the better).
Conceptually, the `p` parameter of both functions represents the probability
that a person doing a manual comparison of the ranked lists would stop (i.e.
decide she has seen enough in order to hazard a conclusion) at each transition
to a lower rank. In other words, it "models the user's *persistence*" (Webber
et. al, p. 17). Formally, it's the parameter of the geometric progression which
weights the contribution of overlaps at different depths.
The code is primarily optimized for correctness, not speed. Build your own
faster version and check it for correctness by comparing against this one!
Requirements
------------
Built and tested under Python 3.5.2. No external dependencies.
License
-------
Credits for the concept of the RBO measure are indicated above.
Copyright (this implementation) © 2016 [ÚČNK](http://korpus.cz)/David Lukeš
This implementation is distributed under the
[GNU General Public License v3](http://www.gnu.org/licenses/gpl-3.0.en.html).
没有合适的资源?快使用搜索试试~ 我知道了~
秩偏重叠列表相似性度量 的 Python 实现_python_代码_下载
共2个文件
md:1个
py:1个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 107 浏览量
2022-07-12
00:51:25
上传
评论
收藏 5KB ZIP 举报
温馨提示
概述 一个小的 Python 模块,用于计算rank-biased overlay,衡量参差不齐的、可能无限的排名列表之间的相似性,这些列表可能包含也可能不包含相同的项目(直到实际评估的深度或根本不包含)。参见 W. Webber、A. Moffat 和 J. Zobel (2011) 的“无限排名的相似性度量” 重叠的定义已被修改以解释联系。没有这个,带有绑定项目的列表的结果就会被夸大。论文中没有提到修改本身,但似乎是合理的,请参阅 function overlap()。因此,代码中与论文中的规范不同的地方会用注释突出显示(搜索“NOTE”)。 供外部使用的功能是rbo()和rbo_dict(),可能还有average_overlap()(用于比较目的)。rbo()接收两个排序列表,其中每个单独的项目是一个可散列对象或一组可散列对象(表示关系): 更多详情、使用方法,请下载后阅读README.md文件
资源推荐
资源详情
资源评论
收起资源包目录
rbo-master.zip (2个子文件)
rbo-master
rbo.py 10KB
README.md 3KB
共 2 条
- 1
资源评论
快撑死的鱼
- 粉丝: 1w+
- 资源: 9156
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功