python我的爬虫笔记资源-CSDN文库

python

爬虫

165 浏览量 2021-01-21 16:25:08 上传评论收藏 105KB PDF 举报

资源推荐

资源详情

资源评论

python我的爬虫笔记我的爬虫笔记

# *壹

#from urllib import request,parse

# 1

#request.urlretrieve('http://www.baidu.com','aaa.html')

# 2

# reas = request.urlopen('http://www.baidu.com')

# print(reas.getcode())

# 3

# a = parse.urlencode({'我是':1,'你是':2,'它是':3})

# print(a)

# print(parse.parse_qs(a))

# 4

# url = 'https://www.baidu.com/s?ie=utf-8&wd=python&tn=78040160_5_pg&ch=3#1'

#result = parse.urlparse(url)#主要对url进行解析，对url按照一定格式进行拆分

#输出

# ParseResult(scheme='https', netloc='www.baidu.com', path='/s', params='', query='ie=utf-8&wd=python&tn=78040160_5_pg&ch=3', fragment='1')

# scheme: https

# netloc: www.baidu.com

# path: /s

# params:

# query: ie=utf-8&wd=python&tn=78040160_5_pg&ch=3

# fragment: 1

# result = parse.urlsplit(url)#没有params,(主要对url进行解析，对url按照一定格式进行拆分)

#输出

# SplitResult(scheme='https', netloc='www.baidu.com', path='/s', query='ie=utf-8&wd=python&tn=78040160_5_pg&ch=3', fragment='1')

# scheme: https

# netloc: www.baidu.com

# path: /s

# query: ie=utf-8&wd=python&tn=78040160_5_pg&ch=3

# fragment: 1

# print(result)

# print('scheme:',result.scheme)

# print('netloc:',result.netloc)

# print('path:',result.path)

# #print('params:',result.params)

# print('query:',result.query)

# print('fragment:',result.fragment)

# 5

# url = 'https://www.lagou.com/jobs/positionAjax.json?city=%E4%B8%8A%E6%B5%B7&needAddtionalResult=false'

# headers = {

# 'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3741.400 QQBrowser/10.5.3868.400'

# }

# data={

# 'first':'True',

# 'pn':1,

# 'kd':'python',

# 'referer':'https://www.lagou.com/jobs/list_java?labelWords=&fromSearch=true&suginput=',

# 'pragma': 'no-cache',

# 'origin': 'https://www.lagou.com'

# }

# req = request.Request(url,headers=headers,data=parse.urlencode(data).encode('utf-8'),method='POST')

# resp = request.urlopen(req)

# print(resp.read().decode('utf-8'))

# 6

# from urllib import request

# url = 'http://httpbin.org/ip'

# handler = request.ProxyHandler({"http":"220.168.52.245:40406"})

# opener = request.build_opener(handler)

# resp = opener.open(url)

# print(resp.read())

# 7

# from urllib import request

# dapeng_url = "http://www.renren.com/880151247/profile"

# headers = {

# 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3741.400 QQBrowser/10.5.3868.400',

# "Cookie": "anonymid=k9xjdjfc-hodq0q; depovince=GW; _r01_=1; JSESSIONID=abc1uPtF8-or69JJe9Xhx; ick_login=298bf271-2e27-4267-b945-d2790750f2f4; taihe_bi_sdk_uid=06d641ecbe8305b3032f9fde9bfaaba6;

taihe_bi_sdk_session=25f0035c05eb8993a05734a3068dd860; t=36979ef16e99fcefa14351e8e304f0af1; societyguester=36979ef16e99fcefa14351e8e304f0af1; id=974393911; xnsid=e0b4519; jebecookies=78966f67-81b5-4159-ac53-

9c5694aaf215|||||; ver=7.0; loginfrom=null; jebe_key=0bc3b536-3d43-416e-a443-3550bea33d34%7Ccd3f341f2a9a65627d4ee21bd7991b3e%7C1588902281361%7C1%7C1588902281143; jebe_key=0bc3b536-3d43-416e-a443-

3550bea33d34%7Ccd3f341f2a9a65627d4ee21bd7991b3e%7C1588902281361%7C1%7C1588902281147; wp_fold=0"

# }

# req = request.Request(url=dapeng_url,headers=headers)

# resp = request.urlopen(req)

# a = resp.read().decode('utf-8')

# with open('wopa.html',mode='w',encoding='utf-8') as f:

# f.write(a)

# 8

# from urllib import request,parse

# from http.cookiejar import CookieJar

# cookiejar = CookieJar()

# handler = request.HTTPCookieProcessor(cookiejar)

# opener = request.build_opener(handler)

# headers = {

# 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3741.400 QQBrowser/10.5.3868.400',

# "Cookie": "anonymid=k9xjdjfc-hodq0q; depovince=GW; _r01_=1; JSESSIONID=abc1uPtF8-or69JJe9Xhx; ick_login=298bf271-2e27-4267-b945-d2790750f2f4; taihe_bi_sdk_uid=06d641ecbe8305b3032f9fde9bfaaba6;

9c5694aaf215|||||; ver=7.0; loginfrom=null; jebe_key=0bc3b536-3d43-416e-a443-3550bea33d34%7Ccd3f341f2a9a65627d4ee21bd7991b3e%7C1588902281361%7C1%7C1588902281143; jebe_key=0bc3b536-3d43-416e-a443-

3550bea33d34%7Ccd3f341f2a9a65627d4ee21bd7991b3e%7C1588902281361%7C1%7C1588902281147; wp_fold=0"

# }

# data = {

# 'email':"18337802329",

# 'password':"wang1234567890."

# }

# login_url = "http://www.renren.com/PLogin.do"

# req = request.Request(login_url,data=parse.urlencode(data).encode('utf-8'),headers=headers)

# request.urlopen(req)

# dapeng_url = "http://www.renren.com/880151247/profile"

# resp = opener.open(dapeng_url)

# with open("wopa.html",mode='w',encoding=('utf-8')) as f:

# f.write(resp.read().decode('utf-8'))

# 9

# python cookie信息的加载与保存

#*贰

# 1

# import requests

# reponse = requests.get("https://www.baidu.com/")

# print(reponse.text)

# print(type(reponse.text))

# print(reponse.content.decode('utf-8'))

# print(type(reponse.content))

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余2页未读，立即下载

评论收藏

内容反馈

weixin_38612811

粉丝: 5
资源: 931

python我的爬虫笔记

python爬虫学习笔记-scrapy框架(1)

python爬虫-mast笔记

python 爬虫学习笔记

python爬虫个人笔记记录

python爬虫学习笔记.zip

python网络爬虫笔记.docx

python爬虫笔记.zip

python爬虫学习笔记.pdf

python爬虫文档

python爬虫，上课笔记用

基于python的爬虫笔记

python进行爬虫小记

python爬虫学习笔记-scrapy框架(2)

python爬虫学习记录

python网络爬虫学习笔记（1）

python爬虫课件+代码.zip

python爬虫实战笔记---以轮子哥为起点Scrapy爬取知乎用户信息

最牛逼的Python爬虫学习笔记，学习过程中记录的笔记

网络爬虫 教程源码笔记python

最牛逼的Python爬虫学习笔记

网易云课堂的python网络爬虫实战笔记与代码实现

Python爬虫精华笔记

爬虫基础教程笔记.rar

01 Python 爬虫学习笔记

爬虫学习笔记

python笔记

Python+Flask爬虫数据可视化分析大作业（说明文档+爬虫笔记+数据库文件）

Python爬虫篇

最新资源

网络爬虫教程源码笔记python