3. Autograd
This matters because training only works when gradients are correct. If you do not understand how PyTorch builds and traverses computation graphs, debugging exploding losses, detached tensors, or broken custom operations becomes guesswork.
[1]:
import torch
x = torch.ones(2, 2, requires_grad=True)
y = x + 2
z = y * y * 3
o = z.mean()
[2]:
x
[2]:
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
[3]:
y
[3]:
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)
[4]:
z
[4]:
tensor([[27., 27.],
[27., 27.]], grad_fn=<MulBackward0>)
[5]:
o
[5]:
tensor(27., grad_fn=<MeanBackward0>)
Note that the partial derivative of \(o\) with respect to \(x_i\) is analytically solved as below.
\(\frac{\partial o}{\partial x_i} = \frac{3}{2} (x_i + 2)\)
This partial derivative evaluated at \(x_i=1\) becomes 4.5.
\(\frac{\partial o}{\partial x_i} |_{x_i=1} = \frac{3}{2}(1 + 2) = \frac{3}{2}(3) = \frac{9}{2} = 4.5\)
[6]:
o.backward()
[7]:
x.grad
[7]:
tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])