to minimize loss
- compute gradient
- update weights where is the learning rate

walkthrough
given training data point (x,y), and linear classifier formula assume correct label is k, so
using cross-entropy loss function
we want to update W by calculating the direction to change the weights to reduce the loss:
we know that (basic derivative)
to compute

backpropagation for computing gradients