to minimize loss

  • compute gradient
  • update weights where is the learning rate

walkthrough

given training data point (x,y), and linear classifier formula assume correct label is k, so

using cross-entropy loss function

we want to update W by calculating the direction to change the weights to reduce the loss:

we know that (basic derivative)

to compute

backpropagation for computing gradients