⼤数据项⽬开发实训 实训要求 利⽤python编写爬⾍程序,从招聘⽹站上爬取数据,将数据存⼊到MongoDB数据库中,将存⼊的数据作⼀定的数据清洗后做数据分析, 利⽤flume采集⽇志进HDFS中,利⽤hive进⾏分析,将hive分析结果利⽤sqoop技术存储到mysql数据库中,并显⽰分析结果,最后将分 析的结果做数据可视化。 搭建爬⾍ 本次选取的⽹站是前程⽆忧⽹,利⽤框架是scrapy,上代码! Wuyou.py 1、爬取字段:职位名称、薪资⽔平、招聘单位、⼯作地点、⼯作经验、学历要求、⼯作内容(岗位职责)、任职要求(技能要求)。 # -*- coding: utf-8 -*- import scrapy from wuyou.items import WuyouItem import re import urllib.parse class WuyouSpider(scrapy.Spider): name = 'Wuyou' allowed_domains = ['51job.com'] # 全国 000000 # web start_urls = [ 'https://search.51job.com/list/000000,000000,0000,00,9,99,web,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom=99 &jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&special area=00&from=&welfare='] # python # start_urls = [ # 'https://search.51job.com/list/000000,000000,0000,00,9,99,python,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom =99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&spe cialarea=00&from=&welfare='] # 数据采集 # start_urls = [ # 'https://search.51job.com/list/000000,000000,0000,00,9,99,%25E6%2595%25B0%25E6%258D%25AE%25E9%2587%2587%25E9%259B%2586,2, 1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&ra dius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare='] dius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare='] # 数据分析 # start_urls = [ # 'https://search.51job.com/list/000000,000000,0000,00,9,99,%25E6%2595%25B0%25E6%258D%25AE%25E5%2588%2586%25E6%259E%2590,2, 1.html?lang=c&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare='] # ⼤数据开发⼯程师 # start_urls = ['https://search.51job.com/list/000000,000000,0000,00,9,99,%25E5%25A4%25A7%25E6%2595%25B0%
剩余28页未读,继续阅读
- 粉丝: 168
- 资源: 3万+
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助