# CrawlerForReader
Android 本地网络小说爬虫,基于 jsoup 与 xpath,通过模版解析网页。
阅读器实现:[https://github.com/smuyyh/BookReader](https://github.com/smuyyh/BookReader)
- [支持书源](#支持书源)
- [模版示例](#模版示例)
- [调用方式](#调用方式)
- [ScreenShot](#screenshot)
## 支持书源
```java
/**
* 所有书源
*/
public static final SparseArray<Source> SOURCES = new SparseArray<Source>() {
{
put(SourceID.LIEWEN, new Source(SourceID.LIEWEN, "猎文网", "https://www.liewen.cc/search.php?keyword=%s"));
put(SourceID.CHINESE81, new Source(SourceID.CHINESE81, "八一中文网", "https://www.zwdu.com/search.php?keyword=%s"));
put(SourceID.ZHUISHU, new Source(SourceID.ZHUISHU, "追书网", "https://www.zhuishu.tw/search.aspx?keyword=%s"));
put(SourceID.BIQUG, new Source(SourceID.BIQUG, "笔趣阁", "http://zhannei.baidu.com/cse/search?s=1393206249994657467&q=%s"));
put(SourceID.WENXUEMI, new Source(SourceID.WENXUEMI, "文学迷", "http://www.wenxuemi.com/search.php?keyword=%s"));
put(SourceID.CHINESEXIAOSHUO, new Source(SourceID.CHINESEXIAOSHUO, "小说中文网", "http://www.xszww.com/s.php?ie=gbk&s=10385337132858012269&q=%s"));
put(SourceID.DINGDIAN, new Source(SourceID.DINGDIAN, "顶点小说", "http://zhannei.baidu.com/cse/search?s=1682272515249779940&q=%s"));
put(SourceID.BIQUGER, new Source(SourceID.BIQUGER, "笔趣阁2", "http://zhannei.baidu.com/cse/search?s=7928441616248544648&ie=utf-8&q=%s"));
put(SourceID.CHINESEZHUOBI, new Source(SourceID.CHINESEZHUOBI, "着笔中文网", "http://www.zbzw.com/s.php?ie=utf-8&s=4619765769851182557&q=%s"));
put(SourceID.DASHUBAO, new Source(SourceID.DASHUBAO, "大书包", "http://zn.dashubao.net/cse/search?s=9410583021346449776&entry=1&ie=utf-8&q=%s"));
put(SourceID.CHINESEWUZHOU, new Source(SourceID.CHINESEWUZHOU, "梧州中文台", "http://www.gxwztv.com/search.htm?keyword=%s"));
put(SourceID.UCSHUMENG, new Source(SourceID.UCSHUMENG, "UC书盟", "http://www.uctxt.com/modules/article/search.php?searchkey=%s", 4));
put(SourceID.QUANXIAOSHUO, new Source(SourceID.QUANXIAOSHUO, "全小说", "http://qxs.la/s_%s"));
put(SourceID.YANMOXUAN, new Source(SourceID.YANMOXUAN, "衍墨轩", "http://www.ymoxuan.com/search.htm?keyword=%s"));
put(SourceID.AIQIWENXUE, new Source(SourceID.AIQIWENXUE, "爱奇文学", "http://m.i7wx.com/?m=book/search&keyword=%s"));
put(SourceID.QIANQIANXIAOSHUO, new Source(SourceID.QIANQIANXIAOSHUO, "千千小说", "http://www.xqqxs.com/modules/article/search.php?searchkey=%s", 4));
put(SourceID.PIAOTIANWENXUE, new Source(SourceID.PIAOTIANWENXUE, "飘天文学网", "http://www.piaotian.com/modules/article/search.php?searchtype=articlename&searchkey=%s"));
put(SourceID.SUIMENGXIAOSHUO, new Source(SourceID.SUIMENGXIAOSHUO, "随梦小说网", "http://m.suimeng.la/modules/article/search.php?searchkey=%s", 4));
put(SourceID.DAJIADUSHUYUAN, new Source(SourceID.DAJIADUSHUYUAN, "大家读书苑", "http://www.dajiadu.net/modules/article/searchab.php?searchkey=%s"));
put(SourceID.SHUQIBA, new Source(SourceID.SHUQIBA, "书旗吧", "http://www.shuqiba.com/modules/article/search.php?searchkey=%s", 4));
put(SourceID.XIAOSHUO52, new Source(SourceID.XIAOSHUO52, "小说52", "http://m.xs52.com/search.php?searchkey=%s"));
}
};
```
## 模版示例
例如针对猎文网:
```json
{
"id": 1, //对应书源id
"search": { //搜索页解析规则
"charset": "UTF-8",
"xpath": "//div[@class='result-item result-game-item']",
"coverXpath": "//div[@class='result-game-item-pic']//a//img/@src",
"titleXpath": "//div[@class='result-game-item-detail']//h3//a/@title",
"linkXpath": "//div[@class='result-game-item-detail']//h3//a/@href",
"authorXpath": "//div[@class='result-game-item-detail']//div[@class='result-game-item-info']//p[1]/span[2]/text()",
"descXpath": "//div[@class='result-game-item-detail']//p[@class='result-game-item-desc']/text()"
},
"catalog": { //目录列表解析规则
"xpath": "//div[@id=list]//dl//dd",
"titleXpath": "//a/text()",
"linkXpath": "//a/@href"
},
"content": { //文章内容解析规则
"xpath": "//div[@id='content']/text()"
}
}
```
## 调用方式
目前虽然请求结果是通过 Callback形式,因为搜多个源是分批返回结果。内部当仍是同步请求,没有做线程调度。
```java
Crawler.search("你好", new SearchCallback() {
@Override
public void onResponse(String keyword, List<SearchBook> appendList) {
}
@Override
public void onFinish() {
}
@Override
public void onError(String msg) {
}
});
Crawler.catalog(new SearchBook.SL("https://www.liewen.cc/b/24/24934/", SourceManager.SOURCES.get(1)), new ChapterCallback() {
@Override
public void onResponse(List<Chapter> chapters) {
}
@Override
public void onError(String msg) {
}
});
Crawler.content(new SearchBook.SL("https://www.liewen.cc/b/24/24934/", SourceManager.SOURCES.get(1)), "/b/24/24934/12212511.html", new ContentCallback() {
@Override
public void onResponse(String content) {
}
@Override
public void onError(String msg) {
}
});
```
## ScreenShot
### Search
<img src="./screenshot/search.png" width=280/>
### SearchResult
<img src="./screenshot/search_result.png" width=280/>
### BookDetail
<img src="./screenshot/detail.png" width=280/>
### ChangeSource
<img src="./screenshot/source.png" width=280/>
## License
```
Copyright 2016 smuyyh, All right reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```
JJJ69
- 粉丝: 6370
- 资源: 5917
最新资源
- CRUISE纯电动车仿真模型,实际项目base模型 simulink DLL联合仿真,基于标定的map模型,适用于vcu+esp实现能量回收的项目 关于模型: 1.策略是用64位软件编译的,如果模
- 全套S7-1200一拖三恒压供水程序样例+PID样例+触摸屏样例 34 1、此程序采用S7-1200PLC和KTP1000PN触摸屏人机执行PID控制变频器实现恒压供水. 包括plc程序,触摸屏
- SOMBP预测模型,数据可以多输入单输出做拟合预测模型,直接替数据就可以使用,程序内有注释,可学习性强,可除两种拟合预测图,以及多种模型评价指标
- Matlab simulink仿真的直流配电网,图2为下垂控制仿真模型,图3为流器(VSC)仿真模型,有这完美的电压与电流波形,两种VSC的有功功率与下垂控制的有功功率,输出电压波形
- 西门子1500PLC机器人焊接程序(西门子PLC+西门子触摸屏) 触摸屏:TP1500 精智面板 PLC:CPU 1516F-3 PN DP 程序:梯形图+SCL PS:注释详细 1台西门子1500P
- 基于WinCE6.0 + Visual Studio2008(VC++开发) + Googol固高codesys运动控制器,开发的示教控制系统 操作者可以通过简单的选择、参数设定而实现相对、绝对定位
- 恒压供水plc程序,1拖1十1辅泵,1拖2十1至1拖4十1辅泵,水箱,无负压通用,有完整的图纸和注释,使用三菱FX1N.2N系列plc十fx0n3a模拟量十昆仑通态tpc7062触摸屏,适合参考学习
- 量产大厂成熟FOC电机控制方案,代码 大厂成熟Foc电机控 码,有原理图,pcb 可用于电动自行车,滑板车,电机Foc控制等 大厂成熟方案,直接可用,,不是一般的普通代码可比的 代码基于st
- 基于遗传算法的车间调度 已知加工时间,如何确定加工顺序和工件分配情况,使得最大完工时间极小化 内涵详细的代码注释
- matlab模型降级算法,传递函数降阶算法 电机控制,并网控制,四旋翼控制等 高阶传递函数进行降级阶处理,逼近传递函数n阶矩阵的距,实现模型降级,操作简单 (有arnolid算法、lanczos
- starccm+电池包热管理-新能源汽车电池包共轭传热仿真 可查學習模型如何搭建,几何清理网格划分,學習重要分析参数如何设置 内容: 0.电池包热管理基础知识讲解,电芯发热机理,电池热管理系统介绍
- 药厂BMS、EMS PLC程序,含触摸屏程序,很有借鉴意义 大型药厂在运行程序; 控制器用的是西门子1500; 里面运用的结构化编程思路很值得借鉴; 药厂各种控制模式; 控温控湿控压; 里面包含数据滤
- 西门子v90伺服与G120 变频pLC控制程序博途Ⅴ14 V15 V16 Ⅴ17版 Cpu为1217,触摸屏为KTp700,4台v90和两台G120釆用PN通讯模式,自动上料机程序 有视屏教程
- matlab simulink 二次调频,4机2区系统二次调频,用模型方法对四机两区系统进行了二次调频分析,有以下两点内容, 1.传统同步机二次调频特性分析 2.用水电风电替系统同步机之后的调频特性
- Matlab使用CNN卷积神经网络进行图像分类,使用了猫狗大战数据集的4000个图像(2000猫2000狗),分为猫狗两个类别 也可以改成多分类 注释详细,可直接运行,可以直接成自己的数据,源代码
- Matlab代码模板,图像处理,色彩补偿,色彩平衡,显示连通分量数量,自动阈值分割图像,人脸数据集的主成分分析,利用最小距离分类器分类3种植物,
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈