gradient descent

to minimize loss

compute gradient
update weights $W \leftarrow W - α \nabla L$ where $α$ is the learning rate

walkthrough

given training data point (x,y), and linear classifier formula $\overset{y}{^} = W x$ assume correct label is k, so $y = k$

using cross-entropy loss function

$L oss = - \overset{y}{^}_{k} + l o g Σ_{j} e^{\overset{y}{^}_{j}}$ we want to update W by calculating the direction to change the weights to reduce the loss: $\frac{d L}{d W} = \frac{d L}{d y ^} \frac{d y ^}{d W}$

we know that $\frac{d y ^}{d W} = x$ (basic derivative)

to compute $\frac{d L}{d y ^}$

backpropagation for computing gradients

jennypng

Recent Notes

Builder

Network(s)

Software design patterns

System Design

designing sql database tables

Explorer

gradient descent

walkthrough

Graph View

Backlinks