没有合适的资源?快使用搜索试试~ 我知道了~
pytorch1.5.1官方英文文档PythonAPI和Library(包含书签,吐血整理)
5星 · 超过95%的资源 需积分: 24 27 下载量 100 浏览量
2020-07-19
19:56:44
上传
评论 1
收藏 11.76MB PDF 举报
温馨提示
试读
747页
pytorch最新版本1.5.1的英文官方文档,因为工作只能离线查看API,所以自己花了几个小时下载下来了所有的API文档,并制作成了pdf,方便查看,同时也制作好了书签,方便阅读
资源推荐
资源详情
资源评论
Table of Contents
PYTORCH DOCUMENTATION
PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
Notes
Automatic Mixed Precision examples
Autograd mechanics
Broadcasting semantics
CPU threading and TorchScript inference
CUDA semantics
Distributed Data Parallel
Extending PyTorch
Frequently Asked Questions
Features for large-scale deployments
Multiprocessing best practices
Reproducibility
Serialization semantics
Windows FAQ
Language Bindings
C++
Javadoc
Python API
torch
torch.nn
torch.nn.functional
torch.Tensor
Tensor Attributes
Tensor Views
torch.autograd
torch.cuda
torch.cuda.amp
torch.distributed
torch.distributions
torch.hub
torch.jit
torch.nn.init
torch.onnx
torch.optim
Quantization
Distributed RPC Framework
torch.random
torch.sparse
torch.Storage
torch.utils.bottleneck
torch.utils.checkpoint
torch.utils.cpp_extension
torch.utils.data
torch.utils.dlpack
torch.utils.model_zoo
torch.utils.tensorboard
Type Info
Named Tensors
Named Tensors operator coverage
torch.__config__
Libraries
torchaudio
torchtext
torchvision
TorchElastic
Next
TorchServe
PyTorch on XLA Devices
Community
PyTorch Contribution Guide
PyTorch Governance
PyTorch Governance | Persons of Interest
INDICES AND TABLES
Index
Module Index
© Copyright 2019, Torch Contributors.
Built with Sphinx using a theme provided by Read the Docs.
Docs
Access comprehensive developer documentation for
PyTorch
View Docs
Tutorials
Get in-depth tutorials for beginners and advanced
developers
View Tutorials
Resources
Find development resources and get your questions
answered
View Resources
PyTorch
Get Started
Features
Ecosystem
Blog
Contributing
Resources
Tutorials
Docs
Discuss
Github Issues
Brand Guidelines
Stay Connected
Email Address
Table of Contents
AUTOMATIC MIXED PRECISION EXAMPLES
torch.cuda.amp.GradScaler is not a complete implementation of automatic mixed precision. GradScaler is only useful if you manually run regions of your model in
float16 . If you aren’t sure how to choose op precision manually, the master branch and nightly pip/conda builds include a context manager that chooses op precision
automatically wherever it’s enabled. See the master documentation for details.
Gradient Scaling
Typical Use
Working with Unscaled Gradients
Gradient clipping
Working with Scaled Gradients
Gradient penalty
Working with Multiple Losses and Optimizers
Gradient Scaling
Gradient scaling helps prevent gradient underflow when training with mixed precision, as explained here.
Instances of torch.cuda.amp.GradScaler help perform the steps of gradient scaling conveniently, as shown in the following code snippets.
Typical Use
# Creates a GradScaler once at the beginning of training.
scaler = GradScaler()
for epoch in epochs:
for input, target in data:
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
# Scales the loss, and calls backward() on the scaled loss to create scaled gradients.
scaler.scale(loss).backward()
# scaler.step() first unscales the gradients of the optimizer's assigned params.
# If these gradients do not contain infs or NaNs, optimizer.step() is then called,
# otherwise, optimizer.step() is skipped.
scaler.step(optimizer)
# Updates the scale for next iteration.
scaler.update()
Working with Unscaled Gradients
All gradients produced by scaler.scale(loss).backward() are scaled. If you wish to modify or inspect the parameters’ .grad attributes between backward() and
scaler.step(optimizer) , you should unscale them first. For example, gradient clipping manipulates a set of gradients such that their global norm (see
torch.nn.utils.clip_grad_norm_() ) or maximum magnitude (see torch.nn.utils.clip_grad_value_() ) is some user-imposed threshold. If you attempted to
clip without unscaling, the gradients’ norm/maximum magnitude would also be scaled, so your requested threshold (which was meant to be the threshold for unscaled gradients) would be
invalid.
scaler.unscale_(optimizer) unscales gradients held by optimizer ’s assigned parameters. If your model or models contain other parameters that were assigned to another
optimizer (say optimizer2 ), you may call scaler.unscale_(optimizer2) separately to unscale those parameters’ gradients as well.
Gradient clipping
Calling scaler.unscale_(optimizer) before clipping enables you to clip unscaled gradients as usual:
WARNING
•
<=
scaler = GradScaler()
for epoch in epochs:
for input, target in data:
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
scaler.scale(loss).backward()
# Unscales the gradients of optimizer's assigned params in-place
scaler.unscale_(optimizer)
# Since the gradients of optimizer's assigned params are unscaled, clips as usual:
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
# optimizer's gradients are already unscaled, so scaler.step does not unscale them,
# although it still skips optimizer.step() if the gradients contain infs or NaNs.
scaler.step(optimizer)
# Updates the scale for next iteration.
scaler.update()
scaler records that scaler.unscale_(optimizer) was already called for this optimizer this iteration, so scaler.step(optimizer) knows not to redundantly unscale
gradients before (internally) calling optimizer.step() .
unscale_() should only be called once per optimizer per step() call, and only after all gradients for that optimizer’s assigned parameters have been accumulated. Calling
unscale_() twice for a given optimizer between each step() triggers a RuntimeError.
Working with Scaled Gradients
For some operations, you may need to work with scaled gradients in a setting where scaler.unscale_ is unsuitable.
Gradient penalty
A gradient penalty implementation typically creates gradients out-of-place using torch.autograd.grad() , combines them to create the penalty value, and adds the penalty value to
the loss.
Here’s an ordinary example of an L2 penalty without gradient scaling:
for epoch in epochs:
for input, target in data:
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
# Creates some gradients out-of-place
grad_params = torch.autograd.grad(loss, model.parameters(), create_graph=True)
# Computes the penalty term and adds it to the loss
grad_norm = 0
for grad in grad_params:
grad_norm += grad.pow(2).sum()
grad_norm = grad_norm.sqrt()
loss = loss + grad_norm
loss.backward()
optimizer.step()
To implement a gradient penalty with gradient scaling, the loss passed to torch.autograd.grad() should be scaled. The resulting out-of-place gradients will therefore be scaled, and
should be unscaled before being combined to create the penalty value.
Here’s how that looks for the same L2 penalty:
WARNING
•
Next
scaler = GradScaler()
for epoch in epochs:
for input, target in data:
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
# Scales the loss for the out-of-place backward pass, resulting in scaled grad_params
scaled_grad_params = torch.autograd.grad(scaler.scale(loss), model.parameters(), create_graph=True)
# Unscales grad_params before computing the penalty. grad_params are not owned
# by any optimizer, so ordinary division is used instead of scaler.unscale_:
inv_scale = 1./scaler.get_scale()
grad_params = [p*inv_scale for p in scaled_grad_params]
# Computes the penalty term and adds it to the loss
grad_norm = 0
for grad in grad_params:
grad_norm += grad.pow(2).sum()
grad_norm = grad_norm.sqrt()
loss = loss + grad_norm
# Applies scaling to the backward call as usual. Accumulates leaf gradients that are correctly scaled.
scaler.scale(loss).backward()
# step() and update() proceed as usual.
scaler.step(optimizer)
scaler.update()
Working with Multiple Losses and Optimizers
If your network has multiple losses, you must call scaler.scale on each of them individually. If your network has multiple optimizers, you may call scaler.unscale_ on any of
them individually, and you must call scaler.step on each of them individually.
However, scaler.update() should only be called once, after all optimizers used this iteration have been stepped:
scaler = torch.cuda.amp.GradScaler()
for epoch in epochs:
for input, target in data:
optimizer0.zero_grad()
optimizer1.zero_grad()
output0 = model0(input)
output1 = model1(input)
loss0 = loss_fn(2 * output0 + 3 * output1, target)
loss1 = loss_fn(3 * output0 - 5 * output1, target)
scaler.scale(loss0).backward(retain_graph=True)
scaler.scale(loss1).backward()
# You can choose which optimizers receive explicit unscaling, if you
# want to inspect or modify the gradients of the params they own.
scaler.unscale_(optimizer0)
scaler.step(optimizer0)
scaler.step(optimizer1)
scaler.update()
Each optimizer independently checks its gradients for infs/NaNs, and therefore makes an independent decision whether or not to skip the step. This may result in one optimizer skipping the
step while the other one does not. Since step skipping occurs rarely (every several hundred iterations) this should not impede convergence. If you observe poor convergence after adding
gradient scaling to a multiple-optimizer model, please file an issue.
Previous
© Copyright 2019, Torch Contributors.
Built with Sphinx using a theme provided by Read the Docs.
Docs
Access comprehensive developer documentation for
PyTorch
View Docs
Tutorials
Get in-depth tutorials for beginners and advanced
developers
View Tutorials
Resources
Find development resources and get your questions
answered
View Resources
剩余746页未读,继续阅读
资源评论
- shuiqing_cjn2023-05-05内容丰富,制作精美。
银河舰长88
- 粉丝: 3
- 资源: 15
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功