python3爬取torrent链接

共6个文件

py：5个

html：1个

python3

torrent

爬虫

3星 · 超过75%的资源需积分: 50 53 下载量 39 浏览量 2016-09-25 12:24:18 上传评论 1 收藏 6KB ZIP 举报

温馨提示

python实战项目。BeautifulSoup和urllib的真实操作。

资源推荐

资源详情

资源评论

收起资源包目录

python3+torrent.zip （6个子文件）

spider_main.py 797B

url_manager.py 634B

output.html 10KB

html_outputer.py 845B

html_downloader.py 465B

html_parser.py 940B

共 6 条

from bs4 import BeautifulSoup import re import urllib.parse class HtmlParser(object): # 解析种子文件 def parserTwo(self,html): if html is None: return soup = BeautifulSoup(html,'html.parser',from_encoding='utf-8') res_datas = self._get_data(soup) return res_datas # 将种子文件的标题，磁力链接和迅雷链接进行封装 def _get_data(self,soup): res_datas = [] all_data = soup.findAll('a',href=re.compile(r"/detail")) all_data2 = soup.findAll('a', href=re.compile(r"magnet")) all_data3 = soup.findAll('a',href=re.compile(r"thunder")) for i in range(len(all_data)): res_data = {} res_data['title'] = all_data[i].get_text() res_data['cl'] = all_data2[i].get('href') res_data['xl'] = all_data3[i].get('href') res_datas.append(res_data) return res_datas

评论收藏

内容反馈

资源评论