<!DOCTYPE html>
<html lang="zh-CN" class="ua-windows ua-webkit">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="renderer" content="webkit">
<meta name="referrer" content="always">
<meta name="google-site-verification" content="ok0wCgT20tBBgo9_zat2iAcimtN4Ftf5ccsh092Xeyw" />
<title>
豆瓣电影 Top 250
</title>
<meta name="baidu-site-verification" content="cZdR4xxR7RxmM4zE" />
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="Sun, 6 Mar 2005 01:00:00 GMT">
<link rel="apple-touch-icon" href="https://img3.doubanio.com/f/movie/d59b2715fdea4968a450ee5f6c95c7d7a2030065/pics/movie/apple-touch-icon.png">
<link href="https://img3.doubanio.com/f/shire/204847ecc7d679de915c283531d14f16cfbee65e/css/douban.css" rel="stylesheet" type="text/css">
<link href="https://img3.doubanio.com/f/shire/0b4cdb02dd620693709d9314196b617f17c2f9ea/css/separation/_all.css" rel="stylesheet" type="text/css">
<link href="https://img3.doubanio.com/f/movie/252bef058b97005c6a41e8f1b9f7b06b84bc71b3/css/movie/base/init.css" rel="stylesheet">
<script type="text/javascript">var _head_start = new Date();</script>
<script type="text/javascript" src="https://img3.doubanio.com/f/movie/0495cb173e298c28593766009c7b0a953246c5b5/js/movie/lib/jquery.js"></script>
<script type="text/javascript" src="https://img3.doubanio.com/f/shire/22ee83f45f94c7a90e73e0ee4acd18f902a6991f/js/douban.js"></script>
<script type="text/javascript" src="https://img3.doubanio.com/f/shire/b0d3faaf7a432605add54908e39e17746824d6cc/js/separation/_all.js"></script>
<link href="https://img3.doubanio.com/f/movie/2c95f768ea74284b900c04c0209b0a44f0a0de52/css/movie/top_movies.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="https://img3.doubanio.com/f/shire/2c0c1c6b83f9a457b0f38c38a32fc43a42ec9bad/js/do.js" data-cfg-autoload="false"></script>
<script type='text/javascript'>
Do.ready(function(){
$("#mine-selector input[type='checkbox']").click(function(){
var val = $(this).is(":checked")?$(this).val():"";
window.location.href = '/top250?filter=' + val;
})
})
</script>
<style type="text/css">
.site-nav-logo img{margin-bottom:0;}
</style>
<style type="text/css">img { max-width: 100%; }</style>
<script type="text/javascript"></script>
<link rel="stylesheet" href="https://img3.doubanio.com/misc/mixed_static/562925b5e3824700.css">
<link rel="shortcut icon" href="https://img3.doubanio.com/favicon.ico" type="image/x-icon">
</head>
<body>
<script type="text/javascript">var _body_start = new Date();</script>
<link href="//img3.doubanio.com/dae/accounts/resources/3e96b44/shire/bundle.css" rel="stylesheet" type="text/css">
<div id="db-global-nav" class="global-nav">
<div class="bd">
<div class="top-nav-info">
<a href="https://accounts.douban.com/passport/login?source=movie" class="nav-login" rel="nofollow">登录/注册</a>
</div>
<div class="top-nav-doubanapp">
<a href="https://www.douban.com/doubanapp/app?channel=top-nav" class="lnk-doubanapp">下载豆瓣客户端</a>
<div id="doubanapp-tip">
<a href="https://www.douban.com/doubanapp/app?channel=qipao" class="tip-link">豆瓣 <span class="version">6.0</span> 全新发布</a>
<a href="javascript: void 0;" class="tip-close">×</a>
</div>
<div id="top-nav-appintro" class="more-items">
<p class="appintro-title">豆瓣</p>
<p class="qrcode">扫码直接下载</p>
<div class="download">
<a href="https://www.douban.com/doubanapp/redirect?channel=top-nav&direct_dl=1&download=iOS">iPhone</a>
<span>·</span>
<a href="https://www.douban.com/doubanapp/redirect?channel=top-nav&direct_dl=1&download=Android" class="download-android">Android</a>
</div>
</div>
</div>
<div class="global-nav-items">
<ul>
<li class="">
<a href="https://www.douban.com" target="_blank" data-moreurl-dict="{"from":"top-nav-click-main","uid":"0"}">豆瓣</a>
</li>
<li class="">
<a href="https://book.douban.com" target="_blank" data-moreurl-dict="{"from":"top-nav-click-book","uid":"0"}">读书</a>
</li>
<li class="on">
<a href="https://movie.douban.com" data-moreurl-dict="{"from":"top-nav-click-movie","uid":"0"}">电影</a>
</li>
<li class="">
<a href="https://music.douban.com" target="_blank" data-moreurl-dict="{"from":"top-nav-click-music","uid":"0"}">音乐</a>
</li>
<li class="">
<a href="https://www.douban.com/location" target="_blank" data-moreurl-dict="{"from":"top-nav-click-location","uid":"0"}">同城</a>
</li>
<li class="">
<a href="https://www.douban.com/group" target="_blank" data-moreurl-dict="{"from":"top-nav-click-group","uid":"0"}">小组</a>
</li>
<li class="">
<a href="https://read.douban.com/?dcs=top-nav&dcm=douban" target="_blank" data-moreurl-dict="{"from":"top-nav-click-read","uid":"0"}">阅读</a>
</li>
<li class="">
<a href="https://douban.fm/?from_=shire_top_nav" target="_blank" data-moreurl-dict="{"from":"top-nav-click-fm","uid":"0"}">FM</a>
</li>
<li class="">
<a href="https://time.douban.com/?dt_time_source=douban-web_top_nav" target="_blank" data-moreurl-dict="{"from":"top-nav-click-time","uid":"0"}">时间</a>
</li>
<li class="">
<a href="https://market.douban.com/?utm_campaign=douban_top_nav&utm_source=douban&utm_medium=pc_web" target="_blank" data-moreurl-dict="{"from":"top-nav-click-market","uid":"0"}">豆品</a>
</li>
</ul>
</div>
</div>
</div>
<script>
;window._GLOBAL_NAV = {
DOUBAN_URL: "https://www.douban.com",
N_NEW_NOTIS: 0,
N_NEW_DOUMAIL: 0
};
</script>
<script src="//img3.doubanio.com/dae/accounts/resources/3e96b44/shire/bundle.js" defer="defer"></script>
<link href="//img3.doubanio.com/dae/accounts/resources/3e96b44/movie/bundle.css" rel="stylesheet" type="text/css">
<div id="db-nav-movie" class="nav">
<div class="nav-wrap">
<div class="nav-primary">
<div class="nav-logo">
<a href="https://movie.douban.com">豆瓣电影</a>
</div>
<div class="nav-search">
<form action="https://search.douban.com/movie/subject_search" method="get">
<fieldset>
<legend>搜索:</legend>
<label for="inp-query">
</label>
<div class="inp"><input id="inp-query" name="search_text" size="22" maxlength="60" placeholder="搜索电影、电视剧、综艺、影人" value=""></div>
<div class="inp-btn"><input type="submit" value="搜索"></div>
<input type="hidden" name="cat" value="1002" />
</fieldset>
</form>
</div>
</div>
</div>
<div class="nav-secondary">
<div class="nav-items">
<ul>
<li ><a href="https://movie.douban.com/cinema/nowplaying/"
>影讯&购票</a>
</li>
<li ><a href="https://movie.douban.com/explore"
>选电影</a>
</li>
<li ><a href="https://movie.douban.com/tv/"
>电视剧</a>
</li>
<li ><a href="https://movie.douban.com/chart"
>排行榜</a>
</li>
<li ><a href="https://movie.douban.com/tag/"
>分类</a>
</li>
<li ><a hr
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
网络爬虫一般分为传统爬虫和聚焦爬虫。 传统爬虫从一个或若干个初始网页的URL开始,抓取网页时不断从当前页面上抽取新的URL放入队列,直到满足系统的一定条件才停止,即通过源码解析来获得想要的内容。 聚焦爬虫需要根据一定的网页分析算法过滤与主题无关的链接,保留有用的链接并将其放入待抓取的URL队列,再根据一定的搜索策略从队列中选择下一步要抓取的网页URL,并重复上述过程,直到满足系统的一定条件时停止。另外,所有被爬虫抓取的网页都将会被系统存储、分析、过滤,并建立索引,以便之后的查询和检索;对于聚焦爬虫来说,这一过程所得到的分析结果还可能对以后的抓取过程给出反馈和指导。 防爬虫:KS-WAF(网站统一防护系统)将爬虫行为分为搜索引擎爬虫及扫描程序爬虫,可屏蔽特定的搜索引擎爬虫节省带宽和性能,也可屏蔽扫描程序爬虫,避免网站被恶意抓取页面。使用防爬虫机制的基本上是企业,我们平时也能见到一些对抗爬虫的经典方式,如图片验证码、滑块验证、封禁 IP等等。 ———————————————— 版权声明:本文为CSDN博主「酒酿小小丸子」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文
资源推荐
资源详情
资源评论
收起资源包目录
python-web-crawler-technology-master.zip (68个子文件)
python-web-crawler-technology-master
jd_cache.sqlite 916KB
pikaqiu.jpg 163KB
3-3.py 834B
3-15.py 138B
2-8.py 334B
3-9.py 986B
3-16.py 2KB
5-4.py 3KB
3-4.py 498B
house_info.csv 413KB
class.py 579B
4-7.py 932B
5-2.py 360B
main.py 3KB
5-1.py 440B
5-3.py 1KB
3-7.py 898B
4-6.py 598B
3-1.py 857B
4-1.py 2KB
3-1-n.html 1KB
4-3.py 159B
4-9.py 1004B
4-2.py 2KB
3-14.py 332B
2-3.py 503B
2-1.py 369B
.idea
python_proj.iml 339B
sqldialects.xml 199B
vcs.xml 185B
misc.xml 203B
dataSources.xml 2KB
inspectionProfiles
profiles_settings.xml 174B
modules.xml 281B
.gitignore 184B
2-4.py 367B
test.db 12KB
3-8.py 626B
4-8.py 1KB
2-16.py 753B
python.csv 8KB
2-13.py 204B
2-12.py 308B
data.json 164B
data.csv 68B
2-11.py 909B
2-2.py 612B
3-5.py 362B
2-15.py 247B
2-2-old.py 788B
2-14.py 213B
2-10.py 618B
4-5.py 609B
2-7.py 231B
4-10.py 3KB
.gitignore 2KB
3-10.py 436B
2-9.py 265B
3-1.html 1KB
3-2.py 652B
3-12.py 329B
3-13.py 162B
4-4.py 751B
3-6.py 629B
2-6.py 336B
3-11.py 482B
2-5.py 688B
movie.txt 1.23MB
共 68 条
- 1
资源评论
开发技术控
- 粉丝: 1950
- 资源: 45
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功