没有合适的资源?快使用搜索试试~ 我知道了~
Introduction to Information Retrieval:19 Web search basics
需积分: 0 4 下载量 155 浏览量
2009-07-13
15:41:20
上传
评论
收藏 1.4MB PPTX 举报
温馨提示
试读
32页
The designers of the first browsers made it easy to view the HTML markup tags on the content of a URL. This simple convenience allowed new users to create their own HTML content without extensive training or experience. Publishing on the Web became a mass activity that was not limited to a few trained programmers, but rather open to hundreds of millions of individuals. For most users and for most information needs, the Web quickly became the best way to supply and consume information on everything from rare ailments to subway schedules.
资源详情
资源评论
资源推荐
Introduction to Information Re-
trieval
19: Web Search Basics
Information Mining Lab
Jin Xueying
2008.10.28
1
Background & history
The designers of the first browsers made it easy to
view the HTML markup tags on the content of a
URL. This simple convenience allowed new users
to create their own HTML content without exten-
sive training or experience.
Publishing on the Web became a mass activity that
was not limited to a few trained programmers, but
rather open to hundreds of millions of individuals.
For most users and for most information needs,
the Web quickly became the best way to supply
and consume information on everything from rare
ailments to subway schedules.
2
Background & history
The mass publishing of information on the Web is es-
sentially useless unless this wealth of information
can be discovered and consumed by other users.
The earliest web search engines had to contend with
indexes containing tens of millions of documents.
Indexing, query serving, and ranking at this scale re-
quired the harnessing together of tens of machines to
create highly available systems, again at scales not
witnessed hitherto in a consumer-facing search ap-
plication.
3
Background & history
The first generation of web search engines
was largely successful, while continually in-
dexing a significant fraction of the Web.
However, the quality and relevance of web
search results left much to be desired owing
to the idiosyncrasies of content creation on
the Web that we will discuss in the next Sec-
tion.
4
Web characteristics
Web page authors created content in dozens of
(natural) languages and thousands of dialects, thus
demanding many different forms of stemming and
other linguistic operations.
web pages exhibited heterogeneity at a daunting
scale, in many crucial aspects:
1) content creation was no longer the privy of edito-
rially trained writers;
2) web publishing in a sense unleashed the best and
worst of desktop publishing on a planetary scale,
so that pages quickly became riddled with wild
variations in colors, fonts, and structure.
5
剩余31页未读,继续阅读
seolyoung
- 粉丝: 0
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0