Better Generalization
Penalize the large weights with weight regularization
Sparse representation with activity regularization
large activations may indicate an over-fit model
there is a tension between the expressiveness and the generalization of the learned features
encourage small activations with additional penalty
track activation mean value
Force small weights with weight constraints
Decouple layers with dropout
Promote robustness with Noise
Halt training at the right time with early stopping
Issues Log
Last updated