Python抓取页面、Pthon爬虫参考资料资源-CSDN文库

5星 · 超过95%的资源需积分: 49 26 浏览量 2010-11-29 16:44:20 上传评论 1 收藏 643KB DOC 举报

资源详情

资源评论

资源推荐

中如何提取网页正文啊谢谢









 提取网页中的文本

1. import os,sys,datetime

2. import httplib,urllib, re

3. from sgmllib import SGMLParser

4. 

5. import types

6. 

7. class Html2txt(SGMLParser):

8. def reset(self):

9. self.text = ''

10. self.inbody = True

11.  SGMLParser.reset(self)

12. def handle_data(self,text):

13. if self.inbody:

14. self.text += text

15. 

16. def start_head(self,text):

17. self.inbody = False

18. def end_head(self):

19. self.inbody = True

20. 

21. 

22. if __name__ == "__main__":

23.  parser = Html2txt()

24. 

parser.feed(urllib.urlopen("http://icode.csdn.net").read())

25.  parser.close()

26. print parser.text.strip()

python 下载网页

import httplib

conn=httplib.HTTPConnection("www.baidu.com")

conn.request("GET","/index.html")

r1=conn.getresponse()

print r1.status,r1.reason

%(



#

)*

#





!"+

"'





%



!"!""$",-.$

!"!!"



!"



/"'"(#



'0'0



通过  的方法

 !"#

!"



#



/1'

"1 2"3/"

"""

"%'((4



"2567()

"2"3("'

"'

剩余63页未读，继续阅读

zhjijia

2014-01-11

很专业，正在学习中

评论收藏

内容反馈

whowhenwhere

粉丝: 3
资源: 19

Python抓取页面、Pthon爬虫参考资料

评论15

最新资源

Python抓取页面、Pthon爬虫参考资料

评论15

python爬虫资料(全)

python简单爬虫抓取网页内容实例

pthon 网络爬虫

学习pthon的最好材料，学习pthon的最好材料

某马的Python学习资料

【python爬虫】爬取网页视频，解析m3u8文件，获取ts并合成mp4

爬虫抓取页面

Python爬虫示例：基础网页内容抓取

python爬虫 抓取页面图片

Python爬虫抓取技术的一些经验

python计算n的阶乘的方法代码

python爬虫mac版本猪精原创

pyautogui和opencv-python的pthon3.10虚拟环境

python利用thinker制作多页面互相切换的代码实例片段

抓取页面，网络爬虫两个例子

python 爬虫之抓取页面图片

实现不同页面抓取的爬虫

Python爬虫之网页图片抓取的方法

lesson7-爬虫入门.rar_python 爬虫_thisn6q_爬虫入门

Pthon 纯Pthon代码玩转小迷宫 Pthon源码

python五子棋(tkinter模块).py

python视频教程等（80+，减少下载积分）

Python官方3.4.10版本tar.xz压缩包

Python教程-程序编程进阶3.pdf

传智博客python就业班

Network Analysis with Python.

Python链表定义、调用等示例程序

最新资源

python爬虫抓取页面图片