pyspider [![Build Status]][Travis CI] [![Coverage Status]][Coverage] [![Try]][Demo]
========
A Powerful Spider(Web Crawler) System in Python. **[TRY IT NOW!][Demo]**
- Write script in Python
- Powerful WebUI with script editor, task monitor, project manager and result viewer
- [MySQL](https://www.mysql.com/), [MongoDB](https://www.mongodb.org/), [Redis](http://redis.io/), [SQLite](https://www.sqlite.org/), [Elasticsearch](https://www.elastic.co/products/elasticsearch); [PostgreSQL](http://www.postgresql.org/) with [SQLAlchemy](http://www.sqlalchemy.org/) as database backend
- [RabbitMQ](http://www.rabbitmq.com/), [Beanstalk](http://kr.github.com/beanstalkd/), [Redis](http://redis.io/) and [Kombu](http://kombu.readthedocs.org/) as message queue
- Task priority, retry, periodical, recrawl by age, etc...
- Distributed architecture, Crawl Javascript pages, Python 2&3, etc...
Tutorial: [http://docs.pyspider.org/en/latest/tutorial/](http://docs.pyspider.org/en/latest/tutorial/)
Documentation: [http://docs.pyspider.org/](http://docs.pyspider.org/)
Release notes: [https://github.com/binux/pyspider/releases](https://github.com/binux/pyspider/releases)
Sample Code
-----------
```python
from pyspider.libs.base_handler import *
class Handler(BaseHandler):
crawl_config = {
}
@every(minutes=24 * 60)
def on_start(self):
self.crawl('http://scrapy.org/', callback=self.index_page)
@config(age=10 * 24 * 60 * 60)
def index_page(self, response):
for each in response.doc('a[href^="http"]').items():
self.crawl(each.attr.href, callback=self.detail_page)
def detail_page(self, response):
return {
"url": response.url,
"title": response.doc('title').text(),
}
```
[![Demo][Demo Img]][Demo]
Installation
------------
* `pip install pyspider`
* run command `pyspider`, visit [http://localhost:5000/](http://localhost:5000/)
Quickstart: [http://docs.pyspider.org/en/latest/Quickstart/](http://docs.pyspider.org/en/latest/Quickstart/)
Contribute
----------
* Use It
* Open [Issue], send PR
* [User Group]
* [中文问答](http://segmentfault.com/t/pyspider)
TODO
----
### v0.4.0
- [x] local mode, load script from file.
- [x] works as a framework (all components running in one process, no threads)
- [x] redis
- [x] shell mode like `scrapy shell`
- [ ] a visual scraping interface like [portia](https://github.com/scrapinghub/portia)
### more
- [x] edit script with vim via [WebDAV](http://en.wikipedia.org/wiki/WebDAV)
License
-------
Licensed under the Apache License, Version 2.0
[Build Status]: https://img.shields.io/travis/binux/pyspider/master.svg?style=flat
[Travis CI]: https://travis-ci.org/binux/pyspider
[Coverage Status]: https://img.shields.io/coveralls/binux/pyspider.svg?branch=master&style=flat
[Coverage]: https://coveralls.io/r/binux/pyspider
[Try]: https://img.shields.io/badge/try-pyspider-blue.svg?style=flat
[Demo]: http://demo.pyspider.org/
[Demo Img]: https://github.com/binux/pyspider/blob/master/docs/imgs/demo.png
[Issue]: https://github.com/binux/pyspider/issues
[User Group]: https://groups.google.com/group/pyspider-users
没有合适的资源?快使用搜索试试~ 我知道了~
python 爬虫实例源码下载(pyspider).zip
共246个文件
py:102个
pyc:34个
md:23个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 182 浏览量
2024-01-01
17:32:44
上传
评论
收藏 6.17MB ZIP 举报
温馨提示
源码下载 欢迎下载
资源推荐
资源详情
资源评论
收起资源包目录
python 爬虫实例源码下载(pyspider).zip (246个子文件)
logging.conf 820B
config 344B
.coveragerc 371B
debug.min.css 5KB
index.min.css 2KB
tasks.min.css 1KB
task.min.css 784B
result.min.css 391B
result.db 0B
description 73B
Dockerfile 790B
exclude 240B
.gitignore 366B
.gitignore 6B
HEAD 177B
HEAD 177B
HEAD 32B
HEAD 23B
index.html 9KB
debug.html 5KB
task.html 4KB
result.html 4KB
tasks.html 2KB
helper.html 341B
pack-204582b9ab405167b12a1df44a0622148e172a12.idx 157KB
pyspider.iml 654B
MANIFEST.in 182B
index 17KB
tox.ini 299B
debug.js 20KB
debug.min.js 15KB
splitter.js 10KB
css_selector_helper.js 7KB
phantomjs_fetcher.js 7KB
index.js 7KB
css_selector_helper.min.js 4KB
index.min.js 4KB
helper.js 1KB
webpack.config.js 805B
result.min.js 258B
tasks.min.js 257B
task.min.js 256B
package.json 652B
debug.less 7KB
index.less 2KB
task.less 1KB
result.less 683B
tasks.less 590B
variable.less 563B
LICENSE 11KB
splash_fetcher.lua 6KB
master 177B
master 41B
Command-Line.md 9KB
self.crawl.md 8KB
HTML-and-CSS-Selector.md 8KB
AJAX-and-more-HTTP.md 7KB
Architecture.md 5KB
Deployment-demo.pyspider.org.md 5KB
Deployment.md 4KB
Render-with-PhantomJS.md 3KB
README.md 3KB
index.md 3KB
Quickstart.md 3KB
Working-with-Results.md 3KB
Frequently-Asked-Questions.md 3KB
Running-pyspider-with-Docker.md 2KB
About-Projects.md 2KB
Response.md 2KB
About-Tasks.md 2KB
self.send_message.md 1KB
Script-Environment.md 1KB
@every.md 756B
@catch_status_code_error.md 558B
index.md 406B
index.md 206B
pack-204582b9ab405167b12a1df44a0622148e172a12.pack 3.69MB
packed-refs 956B
demo.png 834KB
inspect_element.png 252KB
request-headers.png 237KB
search-for-request.png 232KB
css_selector_helper.png 124KB
tutorial_imdb_front.png 94KB
developer-tools-network.png 90KB
index_page.png 77KB
twitch.png 47KB
creating_a_project.png 42KB
run_one_step.png 29KB
developer-tools-network-filter.png 19KB
pyspider-arch.png 17KB
scheduler.py 46KB
run.py 29KB
test_scheduler.py 28KB
tornado_fetcher.py 27KB
test_database.py 24KB
test_fetcher.py 22KB
test_fetcher_processor.py 22KB
test_processor.py 21KB
test_webui.py 20KB
共 246 条
- 1
- 2
- 3
资源评论
苍穹一梦
- 粉丝: 827
- 资源: 2468
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- yolo目标检测项目实验
- downloadFile-1.hc
- Centos7.9环境下离线安装开源版Nginx(亲测版)
- C++课程设计:基于Qt的航班信息管理系统
- ADS7822UVerilog驱动,前面传的有点问题
- 基于python的高性能爬虫程序,使用了多线程+缓存+xpath实现的,这里以彼-岸图库为例,实现,仅用于学习交流
- 中分辨率成像光谱仪(MODIS)烧毁面积产品信息MODIS-C6-BA-User-Guide-1.2.pdf
- Screenshot_20240427_172613_com.huawei.browser.jpg
- 关于学习Python的相关资源网站链接及相关介绍.docx
- (HAL库)基于STM32F103C8T6的温控PID系统[Dht11、ESP8266、无线透传、L298N……]
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功