PCA — find the directions of maximum variance and project your data there
MNIST handwritten digits have 784 features (28×28 pixels). Most pixels are correlated — adjacent pixels tend to have similar values. Can we represent each image with 50 numbers instead of 784, while keeping most of the useful information?
Imagine a long, thin cloud of points tilted at 45°. The longest direction through the cloud captures most of the spread — that's PC1. The perpendicular direction captures the remaining spread — that's PC2. Project all points onto PC1 and you've compressed from 2D to 1D with minimal information loss.
Σ = (1/n) Xᵀ X (X is zero-mean)Σ v = λ v → v₁ = PC1 direction, λ₁ = variance explainedVER_k = λₖ / Σ λᵢChoose k such that cumulative VER ≥ 90–95%.
PCA rotates the coordinate system so PC1 points along the direction of maximum spread in the data.
Plot cumulative variance explained vs k. Retain enough components to explain 90–95% of variance.
Project any dataset to 2D with PCA for a quick visual inspection of cluster structure before modelling.