← Hub
Topic 08 · Phase 2 · Animated

Learn from Your Mistakes

Boosting — sequential learners that each correct the errors of the previous

1

The Problem

Random Forest builds trees in parallel independently. Can we do better by building trees sequentially, each one focused on fixing the mistakes of the previous ones?

×3 parallel
Random Forest
Parallel, independent trees
Boosting
Sequential, error-correcting
"XGBoost wins more Kaggle competitions than any other algorithm — boosting done right is nearly unbeatable on structured data."
2

The Intuition

Each new tree is trained to predict the residuals (errors) of all previous trees combined. The final prediction sums up all the trees, each weighted by a learning rate.

1
Tree 1 fits data → residuals are large
2
Tree 2 fits residuals of Tree 1 → residuals shrink
N
Tree N fits tiny residuals → near-perfect fit
Final: ŷ = η·f₁(x) + η·f₂(x) + … + η·fₙ(x) where η = learning rate
3

The Math

Gradient Boosting Update
F_m(x) = F_{m-1}(x) + η · h_m(x)

h_m fits the negative gradient of the loss (= residuals for MSE loss)

XGBoost Adds Regularisation
L = Σ l(yᵢ, ŷᵢ) + Σ Ω(fₖ)  where Ω = γT + ½λ‖w‖²
Key Hyperparameters

n_estimators — number of trees  |  learning_rate (η) — step size  |  max_depth — tree complexity

4

Assumptions & Pitfalls

Overfitting with too many trees. Unlike Random Forest, boosting can overfit if n_estimators is too large. Use early stopping.
Sensitive to outliers. Boosting focuses on hard examples — outliers get disproportionate weight. Use Huber loss.
Slow training. Sequential by nature. XGBoost and LightGBM use optimisations (histogram-based, GPU) to speed this up.
5

When to Use

Strengths

  • State-of-the-art on structured/tabular data
  • Handles mixed feature types
  • Built-in regularisation (XGBoost)
  • Flexible loss functions

Limitations

  • Can overfit (needs early stopping)
  • More hyperparameters than Random Forest
  • Not interpretable
  • Slow on very large datasets (vs LightGBM)
Kaggle competitions Click-through rate prediction Credit risk Ranking problems

Key Takeaways

Each Tree Corrects Errors

New trees focus on the hardest examples — the ones previous trees got wrong. Errors shrink with each round.

Learning Rate Matters

Lower η = more trees needed but better generalisation. Common practice: 0.01–0.1 with early stopping.

XGBoost / LightGBM

Modern boosting adds L1/L2 regularisation, histogram tricks, and GPU support. Always prefer XGBoost over vanilla GBM.