PySipder是一个Python爬虫程序.rar资源-CSDN文库

共184个文件

py：103个

md：24个

png：13个

python

爬虫

需积分: 5 69 浏览量 2023-07-05 17:07:59 上传评论收藏 2.29MB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

PySipder是一个Python爬虫程序.rar （184个子文件）

.babelrc 28B

logging.conf 763B

.coveragerc 350B

debug.min.css 5KB

index.min.css 2KB

tasks.min.css 1KB

task.min.css 783B

result.min.css 390B

Dockerfile 873B

.gitignore 339B

.gitignore 5B

说明.htm 4KB

index.html 9KB

debug.html 6KB

task.html 4KB

result.html 4KB

tasks.html 2KB

MANIFEST.in 175B

tox.ini 305B

debug.min.js 24KB

debug.js 19KB

splitter.js 10KB

phantomjs_fetcher.js 7KB

css_selector_helper.js 7KB

index.js 6KB

index.min.js 4KB

css_selector_helper.min.js 4KB

webpack.config.js 723B

result.min.js 257B

tasks.min.js 256B

task.min.js 255B

package.json 627B

debug.less 7KB

index.less 2KB

task.less 1023B

result.less 641B

tasks.less 556B

variable.less 545B

LICENSE 11KB

splash_fetcher.lua 6KB

Command-Line.md 9KB

self.crawl.md 8KB

HTML-and-CSS-Selector.md 8KB

AJAX-and-more-HTTP.md 7KB

Architecture.md 5KB

Deployment-demo.pyspider.org.md 4KB

Deployment.md 4KB

Quickstart.md 4KB

Render-with-PhantomJS.md 3KB

Frequently-Asked-Questions.md 3KB

README.md 3KB

index.md 3KB

Working-with-Results.md 3KB

About-Projects.md 2KB

Running-pyspider-with-Docker.md 2KB

About-Tasks.md 2KB

Response.md 2KB

self.send_message.md 1KB

Script-Environment.md 1KB

@every.md 729B

ISSUE_TEMPLATE.md 659B

@catch_status_code_error.md 542B

index.md 396B

index.md 198B

java历史进程.pdf 214KB

demo.png 834KB

inspect_element.png 252KB

request-headers.png 237KB

search-for-request.png 232KB

css_selector_helper.png 124KB

tutorial_imdb_front.png 94KB

developer-tools-network.png 90KB

index_page.png 77KB

twitch.png 47KB

creating_a_project.png 42KB

run_one_step.png 29KB

developer-tools-network-filter.png 19KB

pyspider-arch.png 17KB

scheduler.py 45KB

run.py 28KB

test_scheduler.py 28KB

tornado_fetcher.py 27KB

test_fetcher.py 24KB

test_database.py 23KB

test_fetcher_processor.py 23KB

test_processor.py 20KB

test_webui.py 20KB

base_handler.py 15KB

test_run.py 13KB

counter.py 12KB

pprint.py 12KB

utils.py 12KB

test_message_queue.py 10KB

project_module.py 9KB

rabbitmq.py 8KB

bench.py 8KB

processor.py 8KB

task_queue.py 8KB

response.py 7KB

debug.py 7KB

共 184 条

pyspider [![Build Status]][Travis CI] [![Coverage Status]][Coverage] [![Try]][Demo] ======== A Powerful Spider(Web Crawler) System in Python. **[TRY IT NOW!][Demo]** - Write script in Python - Powerful WebUI with script editor, task monitor, project manager and result viewer - [MySQL](https://www.mysql.com/), [MongoDB](https://www.mongodb.org/), [Redis](http://redis.io/), [SQLite](https://www.sqlite.org/), [Elasticsearch](https://www.elastic.co/products/elasticsearch); [PostgreSQL](http://www.postgresql.org/) with [SQLAlchemy](http://www.sqlalchemy.org/) as database backend - [RabbitMQ](http://www.rabbitmq.com/), [Beanstalk](http://kr.github.com/beanstalkd/), [Redis](http://redis.io/) and [Kombu](http://kombu.readthedocs.org/) as message queue - Task priority, retry, periodical, recrawl by age, etc... - Distributed architecture, Crawl Javascript pages, Python 2.{6,7}, 3.{3,4,5,6} support, etc... Tutorial: [http://docs.pyspider.org/en/latest/tutorial/](http://docs.pyspider.org/en/latest/tutorial/) Documentation: [http://docs.pyspider.org/](http://docs.pyspider.org/) Release notes: [https://github.com/binux/pyspider/releases](https://github.com/binux/pyspider/releases) Sample Code ----------- ```python from pyspider.libs.base_handler import * class Handler(BaseHandler): crawl_config = { } @every(minutes=24 * 60) def on_start(self): self.crawl('http://scrapy.org/', callback=self.index_page) @config(age=10 * 24 * 60 * 60) def index_page(self, response): for each in response.doc('a[href^="http"]').items(): self.crawl(each.attr.href, callback=self.detail_page) def detail_page(self, response): return { "url": response.url, "title": response.doc('title').text(), } ``` [![Demo][Demo Img]][Demo] Installation ------------ * `pip install pyspider` * run command `pyspider`, visit [http://localhost:5000/](http://localhost:5000/) **WARNING:** WebUI is open to the public by default, it can be used to execute any command which may harm your system. Please use it in an internal network or [enable `need-auth` for webui](http://docs.pyspider.org/en/latest/Command-Line/#-config). Quickstart: [http://docs.pyspider.org/en/latest/Quickstart/](http://docs.pyspider.org/en/latest/Quickstart/) Contribute ---------- * Use It * Open [Issue], send PR * [User Group] * [中文问答](http://segmentfault.com/t/pyspider) TODO ---- ### v0.4.0 - [ ] a visual scraping interface like [portia](https://github.com/scrapinghub/portia) License ------- Licensed under the Apache License, Version 2.0 [Build Status]: https://img.shields.io/travis/binux/pyspider/master.svg?style=flat [Travis CI]: https://travis-ci.org/binux/pyspider [Coverage Status]: https://img.shields.io/coveralls/binux/pyspider.svg?branch=master&style=flat [Coverage]: https://coveralls.io/r/binux/pyspider [Try]: https://img.shields.io/badge/try-pyspider-blue.svg?style=flat [Demo]: http://demo.pyspider.org/ [Demo Img]: https://github.com/binux/pyspider/blob/master/docs/imgs/demo.png [Issue]: https://github.com/binux/pyspider/issues [User Group]: https://groups.google.com/group/pyspider-users

评论收藏

内容反馈