to minimize loss
- compute gradient
- update weights where is the learning rate
walkthrough
given training data point (x,y), and linear classifier formula assume correct label is k, so
using cross-entropy loss function
we want to update W by calculating the direction to change the weights to reduce the loss:
we know that (basic derivative)
to compute
backpropagation for computing gradients