efficiently compute gradients (for gradient descent) using chain rule break computation into a graph flow compute forward pass to calculate loss propagate gradients backward compute all gradients for each pair of nodes backwards Example