python爬虫实例源码下载(pyspider).zip_pyspider下载,python爬虫源码下载资源-CSDN文库

共246个文件

py：102个

pyc：34个

md：23个

版权申诉

Python

182 浏览量 2024-01-01 17:32:44 上传评论收藏 6.17MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

python 爬虫实例源码下载(pyspider).zip （246个子文件）

logging.conf 820B

config 344B

.coveragerc 371B

debug.min.css 5KB

index.min.css 2KB

tasks.min.css 1KB

task.min.css 784B

result.min.css 391B

result.db 0B

description 73B

Dockerfile 790B

exclude 240B

.gitignore 366B

.gitignore 6B

HEAD 177B

HEAD 32B

HEAD 23B

index.html 9KB

debug.html 5KB

task.html 4KB

result.html 4KB

tasks.html 2KB

helper.html 341B

pack-204582b9ab405167b12a1df44a0622148e172a12.idx 157KB

pyspider.iml 654B

MANIFEST.in 182B

index 17KB

tox.ini 299B

debug.js 20KB

debug.min.js 15KB

splitter.js 10KB

css_selector_helper.js 7KB

phantomjs_fetcher.js 7KB

index.js 7KB

css_selector_helper.min.js 4KB

index.min.js 4KB

helper.js 1KB

webpack.config.js 805B

result.min.js 258B

tasks.min.js 257B

task.min.js 256B

package.json 652B

debug.less 7KB

index.less 2KB

task.less 1KB

result.less 683B

tasks.less 590B

variable.less 563B

LICENSE 11KB

splash_fetcher.lua 6KB

master 177B

master 41B

Command-Line.md 9KB

self.crawl.md 8KB

HTML-and-CSS-Selector.md 8KB

AJAX-and-more-HTTP.md 7KB

Architecture.md 5KB

Deployment-demo.pyspider.org.md 5KB

Deployment.md 4KB

Render-with-PhantomJS.md 3KB

README.md 3KB

index.md 3KB

Quickstart.md 3KB

Working-with-Results.md 3KB

Frequently-Asked-Questions.md 3KB

Running-pyspider-with-Docker.md 2KB

About-Projects.md 2KB

Response.md 2KB

About-Tasks.md 2KB

self.send_message.md 1KB

Script-Environment.md 1KB

@every.md 756B

@catch_status_code_error.md 558B

index.md 406B

index.md 206B

pack-204582b9ab405167b12a1df44a0622148e172a12.pack 3.69MB

packed-refs 956B

demo.png 834KB

inspect_element.png 252KB

request-headers.png 237KB

search-for-request.png 232KB

css_selector_helper.png 124KB

tutorial_imdb_front.png 94KB

developer-tools-network.png 90KB

index_page.png 77KB

twitch.png 47KB

creating_a_project.png 42KB

run_one_step.png 29KB

developer-tools-network-filter.png 19KB

pyspider-arch.png 17KB

scheduler.py 46KB

run.py 29KB

test_scheduler.py 28KB

tornado_fetcher.py 27KB

test_database.py 24KB

test_fetcher.py 22KB

test_fetcher_processor.py 22KB

test_processor.py 21KB

test_webui.py 20KB

共 246 条

pyspider [![Build Status]][Travis CI] [![Coverage Status]][Coverage] [![Try]][Demo] ======== A Powerful Spider(Web Crawler) System in Python. **[TRY IT NOW!][Demo]** - Write script in Python - Powerful WebUI with script editor, task monitor, project manager and result viewer - [MySQL](https://www.mysql.com/), [MongoDB](https://www.mongodb.org/), [Redis](http://redis.io/), [SQLite](https://www.sqlite.org/), [Elasticsearch](https://www.elastic.co/products/elasticsearch); [PostgreSQL](http://www.postgresql.org/) with [SQLAlchemy](http://www.sqlalchemy.org/) as database backend - [RabbitMQ](http://www.rabbitmq.com/), [Beanstalk](http://kr.github.com/beanstalkd/), [Redis](http://redis.io/) and [Kombu](http://kombu.readthedocs.org/) as message queue - Task priority, retry, periodical, recrawl by age, etc... - Distributed architecture, Crawl Javascript pages, Python 2&3, etc... Tutorial: [http://docs.pyspider.org/en/latest/tutorial/](http://docs.pyspider.org/en/latest/tutorial/) Documentation: [http://docs.pyspider.org/](http://docs.pyspider.org/) Release notes: [https://github.com/binux/pyspider/releases](https://github.com/binux/pyspider/releases) Sample Code ----------- ```python from pyspider.libs.base_handler import * class Handler(BaseHandler): crawl_config = { } @every(minutes=24 * 60) def on_start(self): self.crawl('http://scrapy.org/', callback=self.index_page) @config(age=10 * 24 * 60 * 60) def index_page(self, response): for each in response.doc('a[href^="http"]').items(): self.crawl(each.attr.href, callback=self.detail_page) def detail_page(self, response): return { "url": response.url, "title": response.doc('title').text(), } ``` [![Demo][Demo Img]][Demo] Installation ------------ * `pip install pyspider` * run command `pyspider`, visit [http://localhost:5000/](http://localhost:5000/) Quickstart: [http://docs.pyspider.org/en/latest/Quickstart/](http://docs.pyspider.org/en/latest/Quickstart/) Contribute ---------- * Use It * Open [Issue], send PR * [User Group] * [中文问答](http://segmentfault.com/t/pyspider) TODO ---- ### v0.4.0 - [x] local mode, load script from file. - [x] works as a framework (all components running in one process, no threads) - [x] redis - [x] shell mode like `scrapy shell` - [ ] a visual scraping interface like [portia](https://github.com/scrapinghub/portia) ### more - [x] edit script with vim via [WebDAV](http://en.wikipedia.org/wiki/WebDAV) License ------- Licensed under the Apache License, Version 2.0 [Build Status]: https://img.shields.io/travis/binux/pyspider/master.svg?style=flat [Travis CI]: https://travis-ci.org/binux/pyspider [Coverage Status]: https://img.shields.io/coveralls/binux/pyspider.svg?branch=master&style=flat [Coverage]: https://coveralls.io/r/binux/pyspider [Try]: https://img.shields.io/badge/try-pyspider-blue.svg?style=flat [Demo]: http://demo.pyspider.org/ [Demo Img]: https://github.com/binux/pyspider/blob/master/docs/imgs/demo.png [Issue]: https://github.com/binux/pyspider/issues [User Group]: https://groups.google.com/group/pyspider-users

评论收藏

内容反馈

版权申诉