Docs » Autograd mechanics
Autogradmechanics
This note will present an overview of how autograd works and records the operations. It’s not
strictly necessary to understand all this, but we recommend getting familiar with it, as it will help
you write more efcient, cleaner programs, and can aid you in debugging.
Excludingsubgraphsfrombackward
Every Tensor has a ag: requires_grad that allows for ne grained exclusion of subgraphs from
gradient computation and can increase efciency.
requires_grad
If there’s a single input to an operation that requires gradient, its output will also require gradient.
Conversely, only if all inputs don’t require gradient, the output also won’t require it. Backward
computation is never performed in the subgraphs, where all Tensors didn’t require gradients.
>>> x = torch.randn(5, 5) # requires_grad=False by default
>>> y = torch.randn(5, 5) # requires_grad=False by default
>>> z = torch.randn((5, 5), requires_grad=True)
>>> a = x + y
>>> a.requires_grad
False
>>> b = a + z
>>> b.requires_grad
True
This is especially useful when you want to freeze part of your model, or you know in advance that
you’re not going to use gradients w.r.t. some parameters. For example if you want to netune a
pretrained CNN, it’s enough to switch the requires_grad ags in the frozen base, and no
intermediate buffers will be saved, until the computation gets to the last layer, where the afne
transform will use weights that require gradient, and the output of the network will also require
them.
model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False
# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = nn.Linear(512, 100)
# Optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)
Howautogradencodesthehistory
评论2
最新资源