Better Generalization

Penalize the large weights with weight regularization

Sparse representation with activity regularization

  1. large activations may indicate an over-fit model

  2. there is a tension between the expressiveness and the generalization of the learned features

  3. encourage small activations with additional penalty

  4. track activation mean value

Force small weights with weight constraints

Decouple layers with dropout

Promote robustness with Noise

Halt training at the right time with early stopping

Issues Log

Last updated