爬虫学习,一些实战爬虫学习,一些实战
爬虫学习,一些实战爬虫学习,一些实战1.urllib.request模块urllib.request.urlopen一些参数1.urlRequest类headersRequest.add_header2.date一些函数
geturl(),info(),getcode()response.read()2.urllib.parse模块urllib.parse.urlencode3.json模块json.loads()4.实战1.爬取图片2.有道词典
5.写在最后
前言:前言: 感谢老污龟
[转]Python3中的urllib.request模块(中文).
1.urllib.request模块模块
urllib.request.urlopen
The urllib.request module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic
and digest authentication, redirections, cookies and more.
urllib.request模块定义了方法和类,帮助打开url(主要是HTTP)在一个复杂的世界——基本和摘要式身份验证,重定向,cookies等等。
urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)
一些参数一些参数
1.url
Open the URL url, which can be either a string or a Request object.
url类型可以是string或者Request类型
Request类类
class urllib.request.Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)
This class is an abstraction of a URL request.
这个类是一个抽象的URL请求。
我们在使用urllib.request.urlopen()时,里面的url参数可以直接使用Request类型
比如:
url = 'http://placekitten.com/g/500/500'
req = urllib.request.Request(url)
response = urllib.request.urlopen(req)
另外,我们注意到第二个参数data,和urlopen()类似,我们可以直接将data表单写在这里,不用再传给urlopen()
headers
headers should be a dictionary, and will be treated as if add_header() was called with each key and value as arguments. This is
often used to “spoof” the User-Agent header value, which is used by a browser to identify itself – some HTTP servers only allow
requests coming from common browsers as opposed to scripts.
headers 应该是一个字典应该是一个字典,如果 add_header()被称为与每个键和值作为参数。这通常是用来“恶搞” User-Agent头的值,因为使用一
个浏览器识别本身——一些常见HTTP服务器只允许请求来自浏览器而不是脚本。
有时,我们想隐藏自己的身份,需要修改User-Agent,我们就需要修改headers的参数
e.g.1
head = {}
head['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/80.0.3987.149 Safari/537.36'
date = {}
date['i'] = content
date['doctype'] = 'json'
date = urllib.parse.urlencode(date).encode('utf-8')
req = urllib.request.Request(url,date,head)
response = urllib.request.urlopen(req)
除了直接用,我们还可以使用add_header()函数
Request.add_header
Request.add_header(key, val)
Add another header to the request.
e.g.2
评论0
最新资源