Random Forest — bagging and random feature subsets make weak learners powerful
A single decision tree overfits easily and changes dramatically with small data variations. Can we combine many imperfect trees to get a stable, accurate model?
Two sources of randomness reduce correlation between trees — making their errors independent and thus cancellable:
ŷ = (1/B) Σᵦ fᵦ(x) where fᵦ is tree b
The ~37% rows not in each bootstrap sample act as a free validation set. OOB error is an unbiased estimate of generalisation error — no separate test split needed.
FI(j) = Σᵦ Σₙ (impurity decrease when splitting on feature j)
Bootstrap sampling and random feature subsets ensure trees make different errors — errors that cancel when averaged.
No need for a dedicated test split during training — out-of-bag samples provide an unbiased estimate of generalisation error.
Random Forest with default hyperparameters often outperforms carefully tuned linear models. It's the first non-linear model to try.