多线程爬虫出现报错AttributeError:‘NoneType’objecthasnoattribute‘xpath’_NoneType'objecthasnoattribute'shape资源-CSDN文库

版权申诉

25 浏览量 2020-12-21 19:05:10 上传评论收藏 116KB PDF 举报

资源详情

资源评论

多线程爬虫出现报错多线程爬虫出现报错AttributeError: ‘NoneType’ object has

no attribute ‘xpath’

多线程爬虫出现报错多线程爬虫出现报错AttributeError: ‘NoneType’ object has no attribute ‘xpath’一、前言二、问题三、思考和解决问题

四、运行效果

一、前言一、前言

mark一下，本技术小白的第一篇CSDN博客！

最近在捣鼓爬虫，看的是机械工业出版社的《从零开始学Python网络爬虫》。这书吧，一言难尽，优点是案例比较多，说的

也还算清楚，但是槽点更多：1、较多低级笔误；2、基础知识一笔带过，简单得不能再简单，对Python基础不好的人不友

好；3、代码分析部分，相同的代码反复啰嗦解释多次，而一些该解释的新代码却只字不提；4、这是最重要的一点，但也不

全是本书的锅。就是书中用于案例的很多网页经过一段时间（即从书出版时到现在看书），从网站风格和样式都已经发生了很

大变化，导致书中很多代码都不能用了。

二、问题二、问题

这两天看到爬虫的多线程部分，用简书网站的网页练手，并对比串行爬虫和多线程爬虫的效率。串行爬虫运行正常，多线程爬

虫报错：AttributeError: ‘NoneType’ object has no attribute ‘xpath’。代码如下：

import requests

from lxml import etree

import pymongo

import re

from multiprocessing import Pool

import time

client = pymongo.MongoClient('localhost',27017)

mydb = client['mydb'] jianshu_reping = mydb['jianshu_reping']

def get_reping_infoes(url):

headers = {

'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86

Safari/537.36'

}

res = requests.get(url,headers=headers)

selector = etree.HTML(res.text)

# print("selector type:", type(selector),"response statue code:",res.status_code)

titles = selector.xpath('//a[@class="title"]/text()')

authors = selector.xpath('//a[@class="nickname"]/text()')

abstracts = selector.xpath('//p[@class="abstract"]/text()')

comments = re.findall('iconfont ic-list-comments".*?(.*?)',res.text,re.S)

rewards = re.findall('iconfont ic-list-like".*?(.*?)',res.text,re.S)

for title,author,abstract,comment,reward in zip(titles,authors,abstracts,comments,rewards):

info = {

"title":title,

"author":author,

"abstract":abstract.strip(),

"comment":comment.strip(),

"reward":reward.strip()

}

jianshu_reping.insert_one(info)

# time.sleep(1)

if __name__ == '__main__':

urls = ['https://www.jianshu.com/c/e048f1a72e3d?order_by=added_at&page={}'.format(i) for i in range(1,10)] start_time1 =

time.time()

for url in urls:

get_reping_infoes(url)

end_time1 = time.time()

print("串行爬虫时间：",end_time1-start_time1)

start_time2 = time.time()

pool = Pool(processes=2)

pool.map(get_reping_infoes,urls)

end_time2 = time.time()

print("2线程爬虫时间：",end_time2-start_time2)

首先想到的是百度，但是翻了几页，都找不到几个跟我类似的情况，就算找到一两个相似的，所提出的方法也不管用。

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余1页未读，立即下载

评论收藏

内容反馈

版权申诉

多线程爬虫出现报错AttributeError: ‘NoneType’ object has no attribute ‘xpa...

评论0

最新资源

多线程爬虫出现报错AttributeError: ‘NoneType’ object has no attribute ‘xpa...

评论0

最新资源

相关推荐

python报错: list object has no attribute shape的解决

安装GreenOdoo-8.0-l打开时报错AttributeError: 'NoneType' object has no attribute 'group'

Python3下错误AttributeError: ‘dict’ object has no attribute’iteritems‘的分析与解决

attrs.xml文件

在Python中使用moviepy进行视频剪辑时输出文件报错 ‘NoneType’ object has no attribute ‘stdout’问题

python的mysql数据查询及报错AttributeError: ‘Connection’ object has no attribute ‘curson’

TensorFlow2.1.0报错解决：AttributeError: ‘Sequential’ object has no attribute ‘_get_distribution_strategy

AttributeError: ‘NoneType’ object has no attribute ‘children’ 错误

求解报错：AttributeError:module ‘os’ has no attribute ‘exit’

AttributeError: module 'tensorflow.compat.v1' has no attribute '

Python库 | python-didl-lite-1.1.0.tar.gz

# Pycharm关于AttributeError: ‘DataFrame’ object has no attribute ‘score’的错误

解决python多线程报错:AttributeError: Can&#39;t pickle local object问题

递归神经网络报错has no attribute 'core_rnn_cell'解决方案

pytorch 单机多GPU训练RNN遇到的问题

问题解决：AttributeError: module ‘paddle.fluid’ has no attribute ‘EndStepEvent’

OpenCV：解决NoneType错误

【Python】AttributeError: ‘AutoSchema’ object has no attribute ‘get_link’

PyQt学习随笔：自定义信号连接时报AttributeError: ‘PyQt5.QtCore.pyqtSignal’ object has no attribute ‘connect’

pycharm用import报错：AttributeError: module tensorflow(or other) has no attribut （import搜索路径顺序问题）

pytorch加载自定义网络权重的实现

17 python生成词云（附代码）

并行程序设计导论课后答案_2.7z

加速度积分求速度和位移的c语言算法程序

数据结构课设用C、C++写旅游区景点导游系统头文件(用文件存储，DFS，DIJ算法），完全免费！没有要积分，能多给我点点赞吗？

DeepLearning之LSTM模型输入参数：time_step, input_size, batch_size的理解

Vulkan编程指南.pdf

python自动化办公——python操作Excel、Word、PDF集合大全

Autosar配置工具链

解决python多线程报错:AttributeError: Can't pickle local object问题