[![](https://img.shields.io/badge/language-Python35-green.svg)]() [![](https://img.shields.io/badge/Branch-master-green.svg?longCache=true)]() [![](https://img.shields.io/github/followers/DropsDevopsOrg.svg?label=Follow)]() ![GitHub contributors](https://img.shields.io/github/contributors/DropsDevopsOrg/ECommerceCrawlers.svg) [![](https://img.shields.io/github/forks/DropsDevopsOrg/ECommerceCrawlers.svg?label=Fork&style=social)]() [![](https://img.shields.io/github/stars/DropsDevopsOrg/ECommerceCrawlers.svg?style=social)]() [![](https://img.shields.io/github/watchers/DropsDevopsOrg/ECommerceCrawlers.svg?label=Watch&style=social)]()
## ECommerceCrawlers
多种电商商品数据 🐍 爬虫,整理收集爬虫练习。每个项目都是成员写的。通过实战项目练习解决一般爬虫中遇到的问题。
通过每个项目的 readme,了解爬取过程分析。
对于精通爬虫的 pyer,这将是一个很好的例子减少重复收集轮子的过程。项目经常更新维护,确保即下即用,减少爬取的时间。
对于小白通过 ✍️ 实战项目,了解爬虫的从无到有。爬虫知识构建可以移步[项目 wiki](https://github.com/DropsDevopsOrg/ECommerceCrawlers/wiki/%E7%88%AC%E8%99%AB%E5%88%B0%E5%BA%95%E8%BF%9D%E6%B3%95%E5%90%97%3F)。爬虫可能是一件非常复杂、技术门槛很高的事情,但掌握正确的方法,在短时间内做到能够爬取主流网站的数据,其实非常容易实现,但建议从一开始就要有一个具体的目标。
在目标的驱动下,你的学习才会更加精准和高效。那些所有你认为必须的前置知识,都是可以在完成目标的过程中学到的 😁😁😁。
欢迎大家对本项目的不足加以指正,⭕️Issues 或者 🔔Pr
> 在之前上传的大文件贯穿了 3/4 的 commits,发现每次 clone 达到 100M,这与我们最初的想法违背,我们不能很有效的删除每一个文件(太懒),将重新进行初始化仓库的 commit。并在今后不上传爬虫数据,优化仓库结构。
## About
- 码云仓库链接:[AJay13/ECommerceCrawlers](https://gitee.com/AJay13/ECommerceCrawlers)
- Github 仓库链接:[DropsDevopsOrg/ECommerceCrawlers](https://github.com/DropsDevopsOrg/ECommerceCrawlers)
- 项目展示平台链接:[http://wechat.doonsec.com](http://wechat.doonsec.com)
## Income
几乎 80%的项目都是帮客户写的爬虫,在添加到仓库之前已经经过客户同意可开源原则。
<details>
<summary>收益表</summary>
| 项目 | 收益 | 备注 |
| :-------------- | ---: | :--------------------------: |
| DianpingCrawler | 200 |
| TaobaoCrawler | 2000 |
| SohuNewCrawler | 2500 |
| WechatCrawler | 6000 | |
| 某省药监局 | 80 |
| fofa | 700 |
| baidu | 1000 |
| 蜘蛛泛目录 | 1000 |
| 更多…… | …… | 另部分程序未得到客户开源认可 |
</details>
## CrawlerDemo
- [x] [DianpingCrawler](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/DianpingCrawler):大众点评爬取
- [x] [East_money](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/East_money):scrapy 爬取东方财富网
- [x] [📛TaobaoCrawler(new)](<https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/TaobaoCrawler(new)>):阿里系全自主平台(淘宝、天猫、咸鱼、菜鸟裹裹、飞猪等)信息爬取 免 cookie, 理论上不被反爬虫机制(只提供淘宝,其他思路一样,加密方式一样),
- [x] [📛SIPO 专利审查](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/SIPO专利审查):SIPO 专利审查 自动化客户端
- [x] [📛QiChaCha](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/QiChaCha):企查查 全国工业园区及企业信息
- [x] [TaobaoCrawler](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/TaobaoCrawler):淘宝商品爬取
- [x] [📛ZhaopinCrawler](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/ZhaopinCrawler):各大招聘网站爬取
- [x] [ShicimingjuCrawleAndDisplayr](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/ShicimingjuCrawleAndDisplay):诗词名家句网站爬取展示
- [x] [XianyuCrawler](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/XianyuCrawler):闲鱼商品爬取
- [x] [SohuNewCrawler](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/SohuNewCrawler):新闻网爬取
- [x] [WechatCrawler](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/WechatCrawler):微信公众号爬取
- [x] [cnblog](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/cnblog):scrapy 博客园爬取
- [x] [WeiboCrawler](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/WeiboCrawler):微博数据爬取免 cookie
- [x] [OtherCrawlers](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler):一些有趣的爬虫例子
- [x] [0x01 百度贴吧](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x01baidutieba)
- [x] [0x02 豆瓣电影](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x02doubanmovie)
- [x] [0x03 阿里任务](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x03alitask)
- [x] [0x04 包图网视频](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x04baotu)
- [x] [0x05 全景网图片](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x05quanjing)
- [x] [0x06 豆瓣音乐](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x06douban_music)
- [x] [0x07 某省药监局](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x07gdfda_pharmacy)
- [x] [0x08 fofa](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x08fofa)
- [ ] [0x09 汽车之家](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x09autohome)
- [ ] [0x010 国家统计局]()
- [x] [0x10 baidu](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x10baidu)
- [x] [0x11 蜘蛛泛目录](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x11zzc)
- [x] [0x12 今日头条](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x12toutiao)
- [x] [0x13 豆瓣影评分析](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x13douban_yingping)
- [x] [0x14 协程评论爬取](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x14ctrip_crawler)
- [x] [0x15 小米应用商店爬取](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x15xiaomiappshop)
- [x] [0x16 酷安app信息采集](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x16kuanappshop)
- [ ] [0x17 知乎信息采集](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x17zhihu)
- [x] [0x18 必应图片采集](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x18bing_img)
- [x] [0x19 安居客信息采集](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x19anjuke)
- [x] [0x20 途家民宿信息采集](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x20tujiaminsu)
## Contribution👏
| <a href="https://gitee.com/joseph31"><img class="avatar" src="https://avatars3.githubusercontent.com/u/47005658?s=460&v=4" width="48" height="48" alt="@joseph31"></a> | <a href="https://github.com/Joynice"><img class="avatar" src="https://avatars0.githubusercontent.com/u/22851022?s=96&v=4" width="48" height="48" alt="@Joynice"></a> | <a href="https://github.com/liangweiyang"><img class="avatar" src="https://avat