for training, minimize L=−logP(ytrue) goal: maximize probability for the correct class softmax classifier (multinomial logistic regression)