• efficiently compute gradients (for gradient descent) using chain rule
    • break computation into a graph flow
    • compute forward pass to calculate loss
    • propagate gradients backward
      • compute all gradients for each pair of nodes backwards

Example