Regularization — taming overfitting by penalising large coefficients
Your linear regression model fits the training data perfectly — R² = 0.99 — but performs terribly on new data. The model has learned the noise, not the signal. This is overfitting.
Add a penalty term to the loss function that punishes large coefficients. The model now balances fitting the data and keeping the coefficients small.
J = MSE + λ Σ βⱼ²
J = MSE + λ Σ |βⱼ|
λ (alpha) is the regularisation strength. λ=0 → plain regression. λ→∞ → all coefficients → 0. Choose via cross-validation.
Accepting slightly worse training performance in exchange for much better test performance is the whole point.
Lasso's L1 penalty can zero out irrelevant features entirely — useful when you have hundreds of features and want interpretability.
Find optimal λ with cross-validation. Plot a regularisation path — coefficient values vs log(λ) — to understand what's being penalised.