<h1 align="center">
<img width="100" height="100" src="logo.svg" alt=""><br>
jsdom
</h1>
jsdom is a pure-JavaScript implementation of many web standards, notably the WHATWG [DOM](https://dom.spec.whatwg.org/) and [HTML](https://html.spec.whatwg.org/multipage/) Standards, for use with Node.js. In general, the goal of the project is to emulate enough of a subset of a web browser to be useful for testing and scraping real-world web applications.
The latest versions of jsdom require Node.js v12 or newer. (Versions of jsdom below v17 still work with previous Node.js versions, but are unsupported.)
## Basic usage
```js
const jsdom = require("jsdom");
const { JSDOM } = jsdom;
```
To use jsdom, you will primarily use the `JSDOM` constructor, which is a named export of the jsdom main module. Pass the constructor a string. You will get back a `JSDOM` object, which has a number of useful properties, notably `window`:
```js
const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);
console.log(dom.window.document.querySelector("p").textContent); // "Hello world"
```
(Note that jsdom will parse the HTML you pass it just like a browser does, including implied `<html>`, `<head>`, and `<body>` tags.)
The resulting object is an instance of the `JSDOM` class, which contains a number of useful properties and methods besides `window`. In general, it can be used to act on the jsdom from the "outside," doing things that are not possible with the normal DOM APIs. For simple cases, where you don't need any of this functionality, we recommend a coding pattern like
```js
const { window } = new JSDOM(`...`);
// or even
const { document } = (new JSDOM(`...`)).window;
```
Full documentation on everything you can do with the `JSDOM` class is below, in the section "`JSDOM` Object API".
## Customizing jsdom
The `JSDOM` constructor accepts a second parameter which can be used to customize your jsdom in the following ways.
### Simple options
```js
const dom = new JSDOM(``, {
url: "https://example.org/",
referrer: "https://example.com/",
contentType: "text/html",
includeNodeLocations: true,
storageQuota: 10000000
});
```
- `url` sets the value returned by `window.location`, `document.URL`, and `document.documentURI`, and affects things like resolution of relative URLs within the document and the same-origin restrictions and referrer used while fetching subresources. It defaults to `"about:blank"`.
- `referrer` just affects the value read from `document.referrer`. It defaults to no referrer (which reflects as the empty string).
- `contentType` affects the value read from `document.contentType`, as well as how the document is parsed: as HTML or as XML. Values that are not a [HTML mime type](https://mimesniff.spec.whatwg.org/#html-mime-type) or an [XML mime type](https://mimesniff.spec.whatwg.org/#xml-mime-type) will throw. It defaults to `"text/html"`. If a `charset` parameter is present, it can affect [binary data processing](#encoding-sniffing).
- `includeNodeLocations` preserves the location info produced by the HTML parser, allowing you to retrieve it with the `nodeLocation()` method (described below). It also ensures that line numbers reported in exception stack traces for code running inside `<script>` elements are correct. It defaults to `false` to give the best performance, and cannot be used with an XML content type since our XML parser does not support location info.
- `storageQuota` is the maximum size in code units for the separate storage areas used by `localStorage` and `sessionStorage`. Attempts to store data larger than this limit will cause a `DOMException` to be thrown. By default, it is set to 5,000,000 code units per origin, as inspired by the HTML specification.
Note that both `url` and `referrer` are canonicalized before they're used, so e.g. if you pass in `"https:example.com"`, jsdom will interpret that as if you had given `"https://example.com/"`. If you pass an unparseable URL, the call will throw. (URLs are parsed and serialized according to the [URL Standard](https://url.spec.whatwg.org/).)
### Executing scripts
jsdom's most powerful ability is that it can execute scripts inside the jsdom. These scripts can modify the content of the page and access all the web platform APIs jsdom implements.
However, this is also highly dangerous when dealing with untrusted content. The jsdom sandbox is not foolproof, and code running inside the DOM's `<script>`s can, if it tries hard enough, get access to the Node.js environment, and thus to your machine. As such, the ability to execute scripts embedded in the HTML is disabled by default:
```js
const dom = new JSDOM(`<body>
<script>document.body.appendChild(document.createElement("hr"));</script>
</body>`);
// The script will not be executed, by default:
dom.window.document.body.children.length === 1;
```
To enable executing scripts inside the page, you can use the `runScripts: "dangerously"` option:
```js
const dom = new JSDOM(`<body>
<script>document.body.appendChild(document.createElement("hr"));</script>
</body>`, { runScripts: "dangerously" });
// The script will be executed and modify the DOM:
dom.window.document.body.children.length === 2;
```
Again we emphasize to only use this when feeding jsdom code you know is safe. If you use it on arbitrary user-supplied code, or code from the Internet, you are effectively running untrusted Node.js code, and your machine could be compromised.
If you want to execute _external_ scripts, included via `<script src="">`, you'll also need to ensure that they load them. To do this, add the option `resources: "usable"` [as described below](#loading-subresources). (You'll likely also want to set the `url` option, for the reasons discussed there.)
Event handler attributes, like `<div onclick="">`, are also governed by this setting; they will not function unless `runScripts` is set to `"dangerously"`. (However, event handler _properties_, like `div.onclick = ...`, will function regardless of `runScripts`.)
If you are simply trying to execute script "from the outside", instead of letting `<script>` elements and event handlers attributes run "from the inside", you can use the `runScripts: "outside-only"` option, which enables fresh copies of all the JavaScript spec-provided globals to be installed on `window`. This includes things like `window.Array`, `window.Promise`, etc. It also, notably, includes `window.eval`, which allows running scripts, but with the jsdom `window` as the global:
```js
const { window } = new JSDOM(``, { runScripts: "outside-only" });
window.eval(`document.body.innerHTML = "<p>Hello, world!</p>";`);
window.document.body.children.length === 1;
```
This is turned off by default for performance reasons, but is safe to enable.
(Note that in the default configuration, without setting `runScripts`, the values of `window.Array`, `window.eval`, etc. will be the same as those provided by the outer Node.js environment. That is, `window.eval === eval` will hold, so `window.eval` will not run scripts in a useful way.)
We strongly advise against trying to "execute scripts" by mashing together the jsdom and Node global environments (e.g. by doing `global.window = dom.window`), and then executing scripts or test code inside the Node global environment. Instead, you should treat jsdom like you would a browser, and run all scripts and tests that need access to a DOM inside the jsdom environment, using `window.eval` or `runScripts: "dangerously"`. This might require, for example, creating a browserify bundle to execute as a `<script>` element—just like you would in a browser.
Finally, for advanced use cases you can use the `dom.getInternalVMContext()` method, documented below.
### Pretending to be a visual browser
jsdom does not have the capability to render visual content, and will act like a headless browser by default. It provides hints to web pages through APIs such as `document.hidden` that their content is not visible.
没有合适的资源?快使用搜索试试~ 我知道了~
经典爬虫库(内含超过十种经典爬虫代码)
共2000个文件
js:1531个
md:166个
json:135个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 175 浏览量
2024-03-01
22:08:23
上传
评论
收藏 36.24MB ZIP 举报
温馨提示
经典爬虫库:内含超过十种经典爬虫代码 Scrapy: Scrapy 是一个强大的 Python 爬虫框架,提供了高效的抓取和数据处理能力,支持异步方式处理请求和页面解析。它拥有丰富的特性,如自动 throttling、并发控制、数据存储等。 Beautiful Soup: Beautiful Soup 是一个HTML和XML解析库,可以方便地从网页中提取所需的数据。它支持多种解析器,并提供简单易用的API来处理网页内容。 Requests: Requests 是一个简洁而优雅的HTTP库,用于发送和接收 HTTP 请求。它比 Python 内置的 urllib 库更加友好和方便,适用于爬取简单的网页内容。 Selenium: Selenium 是一个自动化测试工具,可以用于模拟浏览器行为,支持执行 JavaScript 和处理动态网页。对于需要JavaScript渲染的网站,Selenium 是一个强大的选择。 Pyquery: Pyquery 是类似于 jQuery 的库,可以方便地使用类似 jQuery 的选择器来解析和操作 HTML 文档。 等等......
资源推荐
资源详情
资源评论
收起资源包目录
经典爬虫库(内含超过十种经典爬虫代码) (2000个子文件)
学习强国.doc 490KB
esprima.js 277KB
esprima.js 277KB
source-map.debug.js 266KB
source-map.debug.js 266KB
acorn.js 213KB
acorn.js 213KB
acorn.js 195KB
acorn.js 195KB
psl.js 149KB
psl.js 149KB
decimal.js 128KB
decimal.js 128KB
psl.min.js 125KB
psl.min.js 125KB
Document.js 115KB
Document.js 115KB
source-map.js 104KB
source-map.js 104KB
escodegen.js 94KB
escodegen.js 94KB
index.js 91KB
index.js 91KB
index.js 78KB
index.js 78KB
HTMLElement.js 78KB
HTMLElement.js 78KB
saxes.js 73KB
saxes.js 73KB
named-entity-data.js 72KB
named-entity-data.js 72KB
xpath.js 69KB
xpath.js 69KB
SVGElement.js 68KB
SVGElement.js 68KB
regexes.js 65KB
regexes.js 65KB
nwsapi.js 65KB
nwsapi.js 65KB
HTMLInputElement.js 59KB
HTMLInputElement.js 59KB
Element.js 57KB
Element.js 57KB
properties.js 56KB
properties.js 56KB
cookie.js 46KB
cookie.js 46KB
source-map-consumer.js 40KB
source-map-consumer.js 40KB
HTMLTextAreaElement.js 38KB
HTMLTextAreaElement.js 38KB
HTMLInputElement-impl.js 36KB
HTMLInputElement-impl.js 36KB
Node-impl.js 34KB
Node-impl.js 34KB
HTMLAnchorElement.js 32KB
HTMLAnchorElement.js 32KB
XMLHttpRequest-impl.js 32KB
XMLHttpRequest-impl.js 32KB
HTMLSelectElement.js 31KB
HTMLSelectElement.js 31KB
sbcs-data-generated.js 31KB
sbcs-data-generated.js 31KB
websocket.js 30KB
url-state-machine.js 30KB
url-state-machine.js 30KB
HTMLObjectElement.js 30KB
HTMLObjectElement.js 30KB
code.js 29KB
SymbolTree.js 29KB
SymbolTree.js 29KB
Window.js 29KB
Window.js 29KB
HTMLMediaElement.js 28KB
HTMLMediaElement.js 28KB
Document-impl.js 28KB
Document-impl.js 28KB
HTMLBodyElement.js 28KB
HTMLBodyElement.js 28KB
HTMLImageElement.js 28KB
HTMLImageElement.js 28KB
source-map.min.js 26KB
source-map.min.js 26KB
estraverse.js 26KB
estraverse.js 26KB
Range-impl.js 26KB
Range-impl.js 26KB
Node.js 26KB
Node.js 26KB
HTMLTableElement.js 25KB
HTMLTableElement.js 25KB
HTMLAreaElement.js 25KB
HTMLAreaElement.js 25KB
SVGSVGElement.js 23KB
SVGSVGElement.js 23KB
HTMLFrameSetElement.js 23KB
HTMLFrameSetElement.js 23KB
dbcs-codec.js 23KB
dbcs-codec.js 23KB
HTMLTableCellElement.js 22KB
共 2000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 20
资源评论
百锦再@新空间代码工作室
- 粉丝: 1w+
- 资源: 806
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- (源码)基于C语言的操作系统实验项目.zip
- (源码)基于C++的分布式设备配置文件管理系统.zip
- (源码)基于ESP8266和Arduino的HomeMatic水表读数系统.zip
- (源码)基于Django和OpenCV的智能车视频处理系统.zip
- (源码)基于ESP8266的WebDAV服务器与3D打印机管理系统.zip
- (源码)基于Nio实现的Mycat 2.0数据库代理系统.zip
- (源码)基于Java的高校学生就业管理系统.zip
- (源码)基于Spring Boot框架的博客系统.zip
- (源码)基于Spring Boot框架的博客管理系统.zip
- (源码)基于ESP8266和Blynk的IR设备控制系统.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功