Boosting — sequential learners that each correct the errors of the previous
Random Forest builds trees in parallel independently. Can we do better by building trees sequentially, each one focused on fixing the mistakes of the previous ones?
Each new tree is trained to predict the residuals (errors) of all previous trees combined. The final prediction sums up all the trees, each weighted by a learning rate.
F_m(x) = F_{m-1}(x) + η · h_m(x)
h_m fits the negative gradient of the loss (= residuals for MSE loss)
L = Σ l(yᵢ, ŷᵢ) + Σ Ω(fₖ) where Ω = γT + ½λ‖w‖²
n_estimators — number of trees | learning_rate (η) — step size | max_depth — tree complexity
New trees focus on the hardest examples — the ones previous trees got wrong. Errors shrink with each round.
Lower η = more trees needed but better generalisation. Common practice: 0.01–0.1 with early stopping.
Modern boosting adds L1/L2 regularisation, histogram tricks, and GPU support. Always prefer XGBoost over vanilla GBM.