没有合适的资源?快使用搜索试试~ 我知道了~
资源详情
资源评论
资源推荐
Efficient Summarization of URLs using CRC32 for Implementing URL Switching
1
*
Zornitza Genova and Ken Christensen
Department of Computer Science and Engineering
University of South Florida
Tampa, Florida 33620
{zgenova, christen}@csee.usf.edu
1
This material is based upon work supported by the National Science Foundation under Grant No. 9875177.
1
*
Abstract
We investigate methods of using CRC32 for compressing
Web URL strings and sharing of URL lists between
servers, caches, and URL switches. Using trace-based
evaluation, we compare our new CRC32 digesting
method against existing Bloom filter and incremental
CRC19 methods. Our CRC32 method requires less CPU
resources, generates equal or smaller size digests,
achieves equal collision rates, and simplifies switching.
1. Introduction and background
Globally distributed Web sites allow for identical content
to be contained in multiple servers and caches throughout
the Internet (Figure 1). In such systems, URL switching
is used to forward HTTP requests. Sharing of URL lists
enables routing of requests between the cache sites (e.g.,
[2] and [4]) and building of URL routing tables. To
reduce network bandwidth and router table size,
compressing of URL lists into digests is of interest. False
hits, or routing collisions, will occur when URL lists
contain expired entries or the same entry for different
content or when a non-existing URL is requested.
In summary cache [2] each URL in a list is hashed
into a 128-bit value using an MD5 signature [7], which is
then partitioned into four 32-bit quantities further reduced
(by modulo division) to become indexes into a Bloom
filter [1] digest. A vector of m bits is initialized to zero
and then individual bits are set based on hashes (with a
range of 1 to m) of set members. Bloom filters are
probabilistic, false hits result when unique members have
overlapping non-unique hashes. Four indexes are
computed for each URL. Each entry in the Bloom filter
is a 4-bit counter where the counter is incremented when
a URL entry is added and decremented when it is
removed. The number of bits in the Bloom filter is the
“load factor” and is 8, 16, or 32 times the number of
entries in the URL list. MD5 guarantees unique hashes at
the expense of being very processor (or hardware)
intensive [7].
Figure 1. Globally distributed Web infrastructure
We investigate the effectiveness of CRC32 as a
method for digesting and compare it to the Bloom filter
method of [2] and the incremental CRC19 digest of [4].
2. Trace-driven digesting methods evaluation
All experiments were executed on an 866-Mhz Dell
Dimension Pentium III PC (running Windows 2000 and
using the Borland 5.02 “C” compiler and x86 assembler).
The input was a set of URL lists to be compressed into a
digest. These lists were obtained from the access logs:
CA*net squid proxy log (list #1) [6], a log from the USF
Computer Science and Engineering server (list #2), an
NLANR proxy log (list #3) [5], and a server log from
Virginia Tech (list #4) [8]. URL lists #3 and #4 are the
same as used in Summary cache [2]. See Table 1. The
last column is the mean number of parts in a URL.
Table 1. Summary statistics for access logs
# URLs Size URL len URL parts
List #1 2,552,045 140.75 MB 56.8 B 6
List #2 49,029 1.54 28.8 7
List #3 483,631 16.30 56.0 6
List #4 45,817 1.96 43.9 6
Client
Origin server
Caches
Temporary server
URL switch
Sharing of URL lists
HTTP request to the “best” source
IP switch
Client
Origin server
Caches
Temporary server
URL switch
Sharing of URL lists
HTTP request to the “best” source
IP switch
Proceedings of the 27th Annual IEEE Conference on Local Computer Networks (LCN’02)
0742-1303/02 $17.00 © 2002 IEEE
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on February 27, 2009 at 09:09 from IEEE Xplore. Restrictions apply.
zsufrog
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0