2
Outline
Outline
y
analysis
analysis
– which deals with the design requirements and
which deals with the design requirements and
overall architecture of a system;
overall architecture of a system;
y
y
design
design
–
–
which translates a system architecture into
which translates a system architecture into
programming constructs (such as interfaces,
programming constructs (such as interfaces,
classes, and method descriptions);
classes, and method descriptions);
y
y
and programming
and programming
–
–
which implements these programming constructs.
which implements these programming constructs.
3
Defining System Requirements and Capabilities
Defining System Requirements and Capabilities
y
y
Supports capability to crawl
Supports capability to crawl
page
page
s multi
s multi
-
-
threadly
threadly
–
–
Supports persistent HTTP connection
Supports persistent HTTP connection
–
–
Supports DNS cache
Supports DNS cache
–
–
Supports IP block
Supports IP block
–
–
Supports the capability to filter unreachable sites
Supports the capability to filter unreachable sites
–
–
Supports the capability to parse links
Supports the capability to parse links
–
–
Supports the capability to crawl recursively
Supports the capability to crawl recursively
y
y
Supports Tianwang
Supports Tianwang
-
-
format output
format output
y
y
Supports ISAM output
Supports ISAM output
y
y
Supports the capability to enumerate a page according to a
Supports the capability to enumerate a page according to a
URL
URL
y
y
Supports the capability to search a key word in the depot
Supports the capability to search a key word in the depot
4
Three main components of the Web
• HyperText Markup Language
– A language for specifying the contents and layout
of pages
• Uniform Resource Locators
– Identify documents and other resources
• A client-server architecture with HTTP
– By with browsers and other clients fetch
documents and other resources from web servers
5
HTML
<IMG SRC = http://www.cdk3.net/WebExample/Images/earth.jpg>
<P>
Welcome to Earth! Visitors may also be interested in taking a look at the
<A HREF = “http://www.cdk3.net/WebExample/moon.html>Moon</A>.
</P>
(etcetera)
z HTML text is stored in a file of a web server.
z A browser retrieves the contents of this file from a web server.
-The browser interprets the HTML text
-The server can infer the content type from the filename
extension.
评论10