← Hub
Topic 07 · Phase 2 · Animated

Wisdom of Many Trees

Random Forest — bagging and random feature subsets make weak learners powerful

1

The Problem

A single decision tree overfits easily and changes dramatically with small data variations. Can we combine many imperfect trees to get a stable, accurate model?

One Tree
High variance · Overfits easily · Unstable
100 Trees → Vote
Low variance · Robust · Accurate
"A forest of weak learners, each seeing a different random slice of the data, collectively becomes a strong learner."
2

The Intuition

Two sources of randomness reduce correlation between trees — making their errors independent and thus cancellable:

1. Bootstrap Sampling (Bagging)
Each tree is trained on a different bootstrap sample — sample n rows with replacement from the original data. ~63% unique rows per tree.
2. Random Feature Subsets
At each split, only consider √p (classification) or p/3 (regression) randomly chosen features. Prevents all trees from relying on the same dominant feature.
3. Majority Vote / Average
Classification: take the mode of all tree predictions. Regression: take the mean. Individual errors cancel out.
3

The Math

Bagging (Bootstrap Aggregating)
ŷ = (1/B) Σᵦ fᵦ(x)  where fᵦ is tree b
OOB Error (Out-of-Bag)

The ~37% rows not in each bootstrap sample act as a free validation set. OOB error is an unbiased estimate of generalisation error — no separate test split needed.

Feature Importance
FI(j) = Σᵦ Σₙ (impurity decrease when splitting on feature j)
4

Assumptions & Pitfalls

Not interpretable. 500 trees can't be written as human-readable rules. Feature importance is a proxy, not a full explanation.
Slow prediction. Each prediction runs through B trees. For real-time latency-sensitive systems, consider smaller B or XGBoost.
Still needs enough features. If all features are irrelevant, even 1000 trees won't help. Feature engineering still matters.
5

When to Use

Strengths

  • High accuracy with minimal tuning
  • Free OOB validation
  • Built-in feature importance
  • Handles missing values and outliers well

Limitations

  • Not interpretable
  • Slow prediction at inference
  • Memory-intensive for large forests
  • Outperformed by Gradient Boosting on structured data
Telecom churn Fraud detection Genomics feature selection General tabular classification

Key Takeaways

Randomness Reduces Correlation

Bootstrap sampling and random feature subsets ensure trees make different errors — errors that cancel when averaged.

OOB = Free Validation

No need for a dedicated test split during training — out-of-bag samples provide an unbiased estimate of generalisation error.

Strong Baseline

Random Forest with default hyperparameters often outperforms carefully tuned linear models. It's the first non-linear model to try.