<!DOCTYPE html>
<html lang="zh-CN" class="ua-windows ua-webkit">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="renderer" content="webkit">
<meta name="referrer" content="always">
<meta name="google-site-verification" content="ok0wCgT20tBBgo9_zat2iAcimtN4Ftf5ccsh092Xeyw" />
<title>
豆瓣电影 Top 250
</title>
<meta name="baidu-site-verification" content="cZdR4xxR7RxmM4zE" />
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="Sun, 6 Mar 2005 01:00:00 GMT">
<link rel="apple-touch-icon" href="https://img3.doubanio.com/f/movie/d59b2715fdea4968a450ee5f6c95c7d7a2030065/pics/movie/apple-touch-icon.png">
<link href="https://img3.doubanio.com/f/shire/204847ecc7d679de915c283531d14f16cfbee65e/css/douban.css" rel="stylesheet" type="text/css">
<link href="https://img3.doubanio.com/f/shire/0b4cdb02dd620693709d9314196b617f17c2f9ea/css/separation/_all.css" rel="stylesheet" type="text/css">
<link href="https://img3.doubanio.com/f/movie/252bef058b97005c6a41e8f1b9f7b06b84bc71b3/css/movie/base/init.css" rel="stylesheet">
<script type="text/javascript">var _head_start = new Date();</script>
<script type="text/javascript" src="https://img3.doubanio.com/f/movie/0495cb173e298c28593766009c7b0a953246c5b5/js/movie/lib/jquery.js"></script>
<script type="text/javascript" src="https://img3.doubanio.com/f/shire/22ee83f45f94c7a90e73e0ee4acd18f902a6991f/js/douban.js"></script>
<script type="text/javascript" src="https://img3.doubanio.com/f/shire/b0d3faaf7a432605add54908e39e17746824d6cc/js/separation/_all.js"></script>
<link href="https://img3.doubanio.com/f/movie/2c95f768ea74284b900c04c0209b0a44f0a0de52/css/movie/top_movies.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="https://img3.doubanio.com/f/shire/2c0c1c6b83f9a457b0f38c38a32fc43a42ec9bad/js/do.js" data-cfg-autoload="false"></script>
<script type='text/javascript'>
Do.ready(function(){
$("#mine-selector input[type='checkbox']").click(function(){
var val = $(this).is(":checked")?$(this).val():"";
window.location.href = '/top250?filter=' + val;
})
})
</script>
<style type="text/css">
.site-nav-logo img{margin-bottom:0;}
</style>
<style type="text/css">img { max-width: 100%; }</style>
<script type="text/javascript"></script>
<link rel="stylesheet" href="https://img3.doubanio.com/misc/mixed_static/562925b5e3824700.css">
<link rel="shortcut icon" href="https://img3.doubanio.com/favicon.ico" type="image/x-icon">
</head>
<body>
<script type="text/javascript">var _body_start = new Date();</script>
<link href="//img3.doubanio.com/dae/accounts/resources/3e96b44/shire/bundle.css" rel="stylesheet" type="text/css">
<div id="db-global-nav" class="global-nav">
<div class="bd">
<div class="top-nav-info">
<a href="https://accounts.douban.com/passport/login?source=movie" class="nav-login" rel="nofollow">登录/注册</a>
</div>
<div class="top-nav-doubanapp">
<a href="https://www.douban.com/doubanapp/app?channel=top-nav" class="lnk-doubanapp">下载豆瓣客户端</a>
<div id="doubanapp-tip">
<a href="https://www.douban.com/doubanapp/app?channel=qipao" class="tip-link">豆瓣 <span class="version">6.0</span> 全新发布</a>
<a href="javascript: void 0;" class="tip-close">×</a>
</div>
<div id="top-nav-appintro" class="more-items">
<p class="appintro-title">豆瓣</p>
<p class="qrcode">扫码直接下载</p>
<div class="download">
<a href="https://www.douban.com/doubanapp/redirect?channel=top-nav&direct_dl=1&download=iOS">iPhone</a>
<span>·</span>
<a href="https://www.douban.com/doubanapp/redirect?channel=top-nav&direct_dl=1&download=Android" class="download-android">Android</a>
</div>
</div>
</div>
<div class="global-nav-items">
<ul>
<li class="">
<a href="https://www.douban.com" target="_blank" data-moreurl-dict="{"from":"top-nav-click-main","uid":"0"}">豆瓣</a>
</li>
<li class="">
<a href="https://book.douban.com" target="_blank" data-moreurl-dict="{"from":"top-nav-click-book","uid":"0"}">读书</a>
</li>
<li class="on">
<a href="https://movie.douban.com" data-moreurl-dict="{"from":"top-nav-click-movie","uid":"0"}">电影</a>
</li>
<li class="">
<a href="https://music.douban.com" target="_blank" data-moreurl-dict="{"from":"top-nav-click-music","uid":"0"}">音乐</a>
</li>
<li class="">
<a href="https://www.douban.com/location" target="_blank" data-moreurl-dict="{"from":"top-nav-click-location","uid":"0"}">同城</a>
</li>
<li class="">
<a href="https://www.douban.com/group" target="_blank" data-moreurl-dict="{"from":"top-nav-click-group","uid":"0"}">小组</a>
</li>
<li class="">
<a href="https://read.douban.com/?dcs=top-nav&dcm=douban" target="_blank" data-moreurl-dict="{"from":"top-nav-click-read","uid":"0"}">阅读</a>
</li>
<li class="">
<a href="https://douban.fm/?from_=shire_top_nav" target="_blank" data-moreurl-dict="{"from":"top-nav-click-fm","uid":"0"}">FM</a>
</li>
<li class="">
<a href="https://time.douban.com/?dt_time_source=douban-web_top_nav" target="_blank" data-moreurl-dict="{"from":"top-nav-click-time","uid":"0"}">时间</a>
</li>
<li class="">
<a href="https://market.douban.com/?utm_campaign=douban_top_nav&utm_source=douban&utm_medium=pc_web" target="_blank" data-moreurl-dict="{"from":"top-nav-click-market","uid":"0"}">豆品</a>
</li>
</ul>
</div>
</div>
</div>
<script>
;window._GLOBAL_NAV = {
DOUBAN_URL: "https://www.douban.com",
N_NEW_NOTIS: 0,
N_NEW_DOUMAIL: 0
};
</script>
<script src="//img3.doubanio.com/dae/accounts/resources/3e96b44/shire/bundle.js" defer="defer"></script>
<link href="//img3.doubanio.com/dae/accounts/resources/3e96b44/movie/bundle.css" rel="stylesheet" type="text/css">
<div id="db-nav-movie" class="nav">
<div class="nav-wrap">
<div class="nav-primary">
<div class="nav-logo">
<a href="https://movie.douban.com">豆瓣电影</a>
</div>
<div class="nav-search">
<form action="https://search.douban.com/movie/subject_search" method="get">
<fieldset>
<legend>搜索:</legend>
<label for="inp-query">
</label>
<div class="inp"><input id="inp-query" name="search_text" size="22" maxlength="60" placeholder="搜索电影、电视剧、综艺、影人" value=""></div>
<div class="inp-btn"><input type="submit" value="搜索"></div>
<input type="hidden" name="cat" value="1002" />
</fieldset>
</form>
</div>
</div>
</div>
<div class="nav-secondary">
<div class="nav-items">
<ul>
<li ><a href="https://movie.douban.com/cinema/nowplaying/"
>影讯&购票</a>
</li>
<li ><a href="https://movie.douban.com/explore"
>选电影</a>
</li>
<li ><a href="https://movie.douban.com/tv/"
>电视剧</a>
</li>
<li ><a href="https://movie.douban.com/chart"
>排行榜</a>
</li>
<li ><a href="https://movie.douban.com/tag/"
>分类</a>
</li>
<li ><a hr
初学Python之爬虫的教程 以及案例
需积分: 0 195 浏览量
更新于2024-01-05
收藏 623KB ZIP 举报
网络爬虫一般分为传统爬虫和聚焦爬虫。
传统爬虫从一个或若干个初始网页的URL开始,抓取网页时不断从当前页面上抽取新的URL放入队列,直到满足系统的一定条件才停止,即通过源码解析来获得想要的内容。
聚焦爬虫需要根据一定的网页分析算法过滤与主题无关的链接,保留有用的链接并将其放入待抓取的URL队列,再根据一定的搜索策略从队列中选择下一步要抓取的网页URL,并重复上述过程,直到满足系统的一定条件时停止。另外,所有被爬虫抓取的网页都将会被系统存储、分析、过滤,并建立索引,以便之后的查询和检索;对于聚焦爬虫来说,这一过程所得到的分析结果还可能对以后的抓取过程给出反馈和指导。
防爬虫:KS-WAF(网站统一防护系统)将爬虫行为分为搜索引擎爬虫及扫描程序爬虫,可屏蔽特定的搜索引擎爬虫节省带宽和性能,也可屏蔽扫描程序爬虫,避免网站被恶意抓取页面。使用防爬虫机制的基本上是企业,我们平时也能见到一些对抗爬虫的经典方式,如图片验证码、滑块验证、封禁 IP等等。
————————————————
版权声明:本文为CSDN博主「酒酿小小丸子」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文