### Web Crawler with requests and beautifulsoup4(bs4) library in python ###
#### Installing for Development: ####
* IDE:
```
- Download python (https://www.python.org/downloads/)
- Install IDE support compile Python: VSCode, Pycharm, Sublime Text, ...
```
* Extension for IDE:
```
+ For VSCode, you need install some extension to support code python:
. HTML CSS Support
. Python
. Remote Development
+ For IDE different: Search more information on google
```
* To run:
```
- OPEN TERMINAL:
+ cd crawler
+ pip install requests // (pip3 install requests)
+ pip install beautifulsoup4 // (pip3 install beautifullsoup4)
- OPEN getData.py file:
( * If you no need save data into database:
. Comment some function use to connect database: Eg. insertData..(),...
. No need to worry about connecting and dealing with databases.
)
- Replace current available "url" variable in file with the one url address you want.
- Reopen terminal and run with: python getData.py
```
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
基于 Python3,使用 requests 库和 beautifulsoup4 库实现的简易爬虫程序,内置了保存采集的数据保存到数据库的实现,如有需要可打开相关代码注释。 运行说明: 讲程序解压后,进入到主目录,在确认安装了 requests 和 beautifulsoup4 库后(分别使用 pip install requests 和 pip install beautifulsoup4 安装),打开文件 getData.py ,将想要采集的网站的地址贴到 url 那里,然后在命令行下执行 python getData.py
资源推荐
资源详情
资源评论
收起资源包目录
web-crawler-master.zip (69个子文件)
web-crawler-master
first_project
db.sqlite3 128KB
data_crawler
admin.py 505B
migrations
__init__.py 0B
0001_initial.py 1KB
__pycache__
0001_initial.cpython-38.pyc 880B
__init__.cpython-38.pyc 172B
models.py 999B
urls.py 110B
__pycache__
models.cpython-38.pyc 2KB
admin.cpython-38.pyc 786B
first_project
__init__.py 0B
wsgi.py 403B
urls.py 850B
settings.py 3KB
__pycache__
wsgi.cpython-38.pyc 577B
urls.cpython-38.pyc 1KB
settings.cpython-38.pyc 2KB
__init__.cpython-38.pyc 162B
asgi.py 403B
manage.py 637B
news
admin.py 516B
migrations
__init__.py 0B
0001_initial.py 1023B
__pycache__
0001_initial.cpython-38.pyc 962B
__init__.cpython-38.pyc 164B
models.py 432B
templates
news
article_list.html 440B
user_list.html 153B
article_detail.html 41B
year_archive.html 366B
month_archive.html 40B
base.html 198B
urls.py 343B
__pycache__
models.cpython-38.pyc 959B
urls.cpython-38.pyc 488B
admin.cpython-38.pyc 809B
views.cpython-38.pyc 1KB
views.py 1KB
polls
__init__.py 0B
tests.py 60B
admin.py 413B
migrations
__init__.py 0B
0001_initial.py 1KB
__pycache__
0001_initial.cpython-38.pyc 1018B
__init__.cpython-38.pyc 165B
apps.py 85B
models.py 760B
templates
polls
detail.html 547B
index.html 392B
results.html 323B
urls.py 509B
__pycache__
models.cpython-38.pyc 1KB
urls.cpython-38.pyc 553B
admin.cpython-38.pyc 571B
apps.cpython-38.pyc 372B
__init__.cpython-38.pyc 154B
views.cpython-38.pyc 2KB
static
polls
style.css 167B
views.py 2KB
crawler
getData.py 2KB
connection.py 581B
function.py 5KB
__pycache__
function.cpython-38.pyc 3KB
connection.cpython-38.pyc 678B
bash.exe.stackdump 1KB
README.md 1008B
SonNhaDep
getData.py 719B
function.py 475B
__pycache__
function.cpython-38.pyc 660B
共 69 条
- 1
资源评论
SQL必知必会
- 粉丝: 16
- 资源: 25
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功