人工智能-项目实践-搜索引擎-基于vue前端框架/scrapy爬虫框架/结巴分词实现的小型搜索引擎资源-CSDN文库

共46个文件

js：26个

png：5个

map：4个

版权申诉

人工智能

搜索引擎

vue.js

scrapy

爬虫

105 浏览量 2024-02-26 17:03:34 上传评论收藏 672KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

tinySearchEngine-master.zip （46个子文件）

tinySearchEngine-master

DomzSpider.py 2KB

image

img3.png 142KB

img4.png 130KB

img2.png 13KB

img1.png 5KB

src

App.vue 440B

assets

logo.png 7KB

main.js 498B

components

searchResult.vue 5KB

searchEngine.vue 1KB

router

index.js 406B

dist

index.html 450B

static

vendor.080f64fd4ca0f1bcf027.js.map 1.07MB

vendor.080f64fd4ca0f1bcf027.js 142KB

manifest.42e3257fe0dc6f03a838.js 1KB

app.2e2f3fd3b6a1a7409115.js 5KB

manifest.42e3257fe0dc6f03a838.js.map 14KB

app.2e2f3fd3b6a1a7409115.js.map 39KB

css

app.bc6bcebd8dff1469ed2a180d243d1fb1.css 2KB

app.bc6bcebd8dff1469ed2a180d243d1fb1.css.map 4KB

package.json 3KB

back_end.php 3KB

build

check-versions.js 1KB

dev-server.js 2KB

utils.js 2KB

vue-loader.conf.js 307B

webpack.prod.conf.js 4KB

build.js 953B

dev-client.js 245B

webpack.base.conf.js 1KB

webpack.dev.conf.js 1KB

webpack.test.conf.js 584B

index.html 201B

test

e2e

specs

test.js 561B

nightwatch.conf.js 1KB

runner.js 1KB

custom-assertions

elementCount.js 777B

unit

specs

Hello.spec.js 335B

karma.conf.js 992B

index.js 487B

jsonToMySQL.py 920B

urlToKeywords.py 1KB

config

test.env.js 132B

prod.env.js 48B

index.js 1KB

dev.env.js 139B

# -*- coding: utf-8 -*- #from scrapy.spider import BaseSpider from scrapy.contrib.spiders import CrawlSpider,Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.dupefilter import RFPDupeFilter from scrapy.selector import HtmlXPathSelector from searchEngine.items import SearchengineItem class DmozSpider(CrawlSpider): name = "dmoz" allowed_domains = [ "news.cn", "news.xinhuanet.com" ] start_urls = [ "http://www.news.cn/", "http://www.news.cn/mil/index.htm", "http://www.news.cn/politics/" "http://www.news.cn/world/index.htm", "http://www.news.cn/tech/index.htm" ] rules = ( #Rule(SgmlLinkExtractor(allow=('page/[0-9]+', ))), #Rule(SgmlLinkExtractor(allow=['/' ]),'item_parse') Rule(SgmlLinkExtractor(allow=('/', )),callback='item_parse'), ) def item_parse(self,response,dont_filter=False): #self.log("%s"%response.url) item = SearchengineItem() item['url'] = response.url item['title'] = response.selector.xpath('//title/text()').extract() item['keywords'] = response.selector.xpath('//meta[@name="keywords"]/@content').extract() item['description'] = response.selector.xpath('//meta[@name="description"]/@content').extract() for t in item['title']: print t.encode('utf-8') for t in item['keywords']: print t.encode('utf-8') for t in item['description']: print t.encode('utf-8') return item #print item['title'] ''' def parse(self, response): #hxs = HtmlXPathSelector(response) #sites = hxs.select('//head') #res = HtmlXPathSelector(response) item = SearchengineItem() #for site in sites: # item = SearchengineItem() # item['title'] = site.select('//title/text()').extract() # item['link'] = site.select('meta/@keywords').extract() #item['desc'] = site.select('text()').extract() #items.append(item) item['title'] = response.selector.xpath('//title/text()').extract() item['keywords'] = response.selector.xpath('//meta[@name="keywords"]/@content').extract() item['description'] = response.selector.xpath('//meta[@name="description"]/@content').extract() for t in item['title']: print t.encode('utf-8') for t in item['keywords']: print t.encode('utf-8') for t in item['description']: print t.encode('utf-8') #print item['title'] return item '''

评论收藏

内容反馈

版权申诉