Decision Trees — recursive splitting that mirrors human decision-making
A hospital wants to predict heart disease risk from patient data (age, chest pain, cholesterol). Doctors need an explainable model — not a black box. Decision trees produce human-readable rules.
if age > 55 AND chest_pain = typical AND chol > 240: → HIGH RISK
At each node, the tree asks: "Which feature and threshold, if I split on it right now, creates the purest possible child groups?" It greedily picks the best split, then recurses on each child.
Answer each question to traverse the decision tree.
Gini = 1 − Σ pᵢ²
0 = perfectly pure, 0.5 = maximally impure (2 classes)
IG = H(parent) − Σ (|child|/|parent|) · H(child)
At each node, try all features and all thresholds. Pick the split that maximises Information Gain or minimises Gini. Greedy — no backtracking.
Every split is chosen to maximise information gain or minimise Gini impurity — a measure of how mixed the classes are.
max_depth and min_samples_leaf are the most important hyperparameters. Start shallow and add depth only if validation improves.
A single tree is weak but interpretable. Combine 100 trees in Random Forest or Boosting for state-of-the-art accuracy.