Support Vector Machines — find the hyperplane with the widest possible margin
An email spam classifier needs to separate spam from ham in a high-dimensional feature space (word frequencies). We want a boundary that generalises well — not just one that fits training data.
Imagine the two classes as two groups of points. Draw parallel lanes (margins) around the dividing hyperplane. Widen the lanes as much as possible while keeping them empty. The points on the lane edges are the support vectors.
Maximise 2/‖w‖ ⟺ Minimise ½‖w‖²Subject to: yᵢ(w·xᵢ + b) ≥ 1 for all i
Min ½‖w‖² + C Σ ξᵢC controls margin width vs. misclassification penalty. High C = narrow margin, fewer errors.
Linear — K(x,z) = xᵀz | RBF — exp(−γ‖x−z‖²) | Polynomial — (xᵀz + c)ᵈ
SVM's entire objective is to find the boundary with the widest gap between classes — width = generalisation.
The kernel computes dot products in high-dimensional space without ever materialising that space — computationally cheap.
Once trained, predictions depend only on the support vectors — the boundary points. All other training examples are irrelevant.