• loss over entire dataset = avg loss over examples

examples

  • L2 loss: squared error
    • not robust to outliers
  • L1 loss

to minimize loss gradient descent