Abstract Copy detection in Digital Libraries may provide the necessary guarantees for publishers and newsfeed ser¬vices to o~er valuable on-line data. We consider the case for a registration server that maintains regis¬tered documents against which new documents can be checked for overlap. In this paper we present a new scheme for detecting copies based on compar¬ing the word frequency occurrences of the new docu¬ment against those of registered documents. We also report on an experimental comparison between our proposed scheme and COPS [6], a detection scheme based on sentence overlap. The tests involve over a million comparisons of netnews articles and show that in general the new scheme performs better in detecting documents that have partial overlap.
- 粉丝: 0
- 资源: 2
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助