抓取网页内容生成Kindle电子书

所需积分/C币:50 2019-07-02 17:26:44 935KB PDF
93
收藏 收藏
举报

自从买了kindle后,总是想着如何最大效用发挥其效用。虽然多看上有很多书可 以购买,网上也有很多免费的电子书,但是仍然有很多感兴趣的内容是以网页的 形式存在的。例如O’Reilly Atlas就提供了诸多电子书,但是只提供免费的在线阅 读;另外还有很多资料或文档都只有网页形式。于是就希望通过某种方法讲这些 在线资料转为epub或mobi格式,以便在kindle上阅读。这篇文章介绍了如何借助 calibre并编写少量代码来达到这个目的。
33 articles. append(a) 34. 35 ans =[('Git Pocket Guide, articles) 36. 37. return ans 下面分别解释代码中不同部分。 总体结构 总体来看,一个 recipes就是一个 python class,只不过这个cas」须继承 calibre. web feeds. recipes. BasicNews Recipe parse index 整个 recipes的核心方法是 parse index,也是 recipes唯一必须实现的方法。这个 方法的目标是通过分析 index页面的内容,返回—个稍显复杂的数据结构(稍后 介绍),这个数据结构定义了整个电子书的内容及内容组织顺序 总体属性设置 在 class的开始,定义了一些全局属性: 1. title ='Git Pocket guide 2. description 3.coverurl=http:/lakamaicovers.oreillycom/images/0636920024972/rg.jpg 4 5.urlprefix=http://chimera.labsoreillycom/books/12300000005617 6. no stylesheets= True 7. keep only tags=[":'chapter'1 tte:电子书标题 description:电子书描述 cover url:电子书的封面图片 url prefix:这是我自用的属性,是內容页面的前缀,用于后面拼装内容页的完整 no stylesheets:不要使用页面cSS样式 keep only tags:这一行告诉 calibre分析ndex页时仅考虑 class属性 为 chapter的DoM元素,如果你看ndex页的源码会发现这对应一级标题。之所以这 样是因为在这个例子中, index页面每个一级标题对应一个独立内容页,而二级标题 仅链接到页面中某个锚点( anchor),所以仅需考虑一级标题 parse index返回值 下面介绍 parse_ index需要通过分析 cindex页面返回的数据结构。 Book List-[Tuple, Tuple,gTuple volume volume Tuple -( String List olume String Title Chapters List -[Map, Map, . Map Chapter(Chapter(Chapter Map- String Stringl Content String/Chapter Title Page String URL 总体返回数据结构是一个ist,其中每个元素是一个 tuple,一个 tuple表示一卷 ( volume)。在这个例子中只有一卷,所以st中只有一个 tuple 每个upe有两个元素,第一个元素是卷名,第二个元素是一个ist,ist中每个元 素是一个map,表示一章( chapter),map中有两个元素:te和url, title是章 节标题,url是章节所在内容页的url。 Calibre会根据 parse_ index的返回结果抓取并组织整个书,并且会自行抓取并处 理内容中外链的图片。 整个 parse_ index使用soup解析 index页并生成上述数据结构, 更多 上面是最基本的 recipes,想了解更多的使用方法,可以参考AP|文档。 生成mobi 编写好 recipes后,在命令行下通过如下命令即可生成电子书: 1. ebook-convert Git Pocket Guide recipe Git Pocket Guide. mobi 即可生成mob格式的电子书。 ebook- convert会根据 recipes代码自行抓取相关内 容并组织结构。 最终效果 下面是在 kindle上看到的效果 目录 所有文章 1. Understanding Git 2. Getting Started 3. Making commits 4. Undoing and Editing Commits 5. Branching 6. Tracking Other Repositories 7. Merging 8. Naming Commits e9, viewing History 10. Editing History 11. Understanding Patches ·12. Remote access 13. Miscellaneous 14. How Do L .? 048% 内容 下一项|章节菜单|主菜单|前一项丨 Chapter 2.Gett ing Started In this chapter, you'll get started working with Git by setting your defaults and preferences, and learn the basics of creating a repository and adding initial content to it Basic Configuration Before starting in with Git, you'll want to set a few basic parameters using git config. This command reads and changes Git configuration at the repository, personal, or system level. Your personal Git configuration is normally in".gitconfig; this is a plain- text file, which you can edit directly as well, if you like Its format is called INI style (after the file extension commonly used for it), and is divided into sections, like SO: 166 内容二 See git-config(1) for more detail on the format of the configuration files, many parameters(some mentioned in this text and some not), and other uses of git config, such as querying the current setting of a parameter Personal ldentification Git will guess your name and email address from the environment, but those may vary from one computer to another and may not be what you want To set them: s git config-- global user name"Richard E. Silverman" s git config-- globaluser.emailres@oreilly.com If you use the same / gitconfig in multiple contexts, say at home and at work, then this may be inconvenient. Git will take your email address from the EMAIL environment variable before resorting to a guess, so you can leave it out of your Git configuration and set EMAIL appropriately in the different contexts, usually with your shell startup files, such 1696阳 含有图片的页 Start with a master branch A 〔) master Run git branch alvin; there are now two branches ending at C, master and alvin, and you are alvin AHBHC(master, a/vin) Make two commits on alvin the master branch stilllends at c master 1)(2)ai Switch back to master and make two commits. now the branches have diverged. REMaster alvin 3397 实际效果 tre s1. The ogress of barish names Yous can Misa spacy t cemm it at walch ra stirt the fce要m s xit check uat -h urn reai Fad switched to a nea tranch siron This starts a new bianch at the named commi anu snitches to it. If you have conflicting uncommitted changes, though, you wil hawe to deal with them first. sf wu wart +a create the nen branch but rw awitch to Et, , se sit b=ane sitar Switching Branches The trual too for switching branches is gil cherkaur, nf :hir.h thr -t netin givcn ache is just a ecial case: im ng to a baruch that doesnt yeL exist is tresling a new lanch The pay thing that lias te happr to switch ranches is to change thc HL. 4)sy i ta pe nt to the new brarch natte. The HEAD by def niton indicates the b uciE tha, yu are"ucl, "anid switching to a branch cmeans that you are then"on"thet branch. Here, sit kindle 我的 recipes仓库 我在ghb上建了一个knde-open- books,里面放了一些 recipes,有我写的, 也有其他同学贡献的。欢迎任何人贡献的 recipes。

...展开详情
试读 9P 抓取网页内容生成Kindle电子书
立即下载 身份认证后 购VIP低至7折
一个资源只可评论一次,评论内容不能少于5个字
您会向同学/朋友/同事推荐我们的CSDN下载吗?
谢谢参与!您的真实评价是我们改进的动力~
  • 分享王者

关注 私信
上传资源赚钱or赚积分
最新推荐
抓取网页内容生成Kindle电子书 50积分/C币 立即下载
1/9
抓取网页内容生成Kindle电子书第1页
抓取网页内容生成Kindle电子书第2页

试读结束, 可继续读1页

50积分/C币 立即下载