← Hub
Topic 12 · Phase 3 · Interactive

Compress Without Losing What Matters

PCA — find the directions of maximum variance and project your data there

1

The Problem

MNIST handwritten digits have 784 features (28×28 pixels). Most pixels are correlated — adjacent pixels tend to have similar values. Can we represent each image with 50 numbers instead of 784, while keeping most of the useful information?

784 features
High redundancy — most are correlated
50 components
94% variance retained · 15× compression
"PCA doesn't select features — it invents new ones that are linear combinations of the originals, maximally spread out."
2

The Intuition

Imagine a long, thin cloud of points tilted at 45°. The longest direction through the cloud captures most of the spread — that's PC1. The perpendicular direction captures the remaining spread — that's PC2. Project all points onto PC1 and you've compressed from 2D to 1D with minimal information loss.

1
Standardise features (zero mean, unit variance)
2
Compute covariance matrix XᵀX/n
3
Find eigenvectors (principal components) and eigenvalues (variance explained)
4
Project data onto top k eigenvectors
3

The Math

Covariance Matrix
Σ = (1/n) Xᵀ X (X is zero-mean)
Eigendecomposition
Σ v = λ v  →  v₁ = PC1 direction, λ₁ = variance explained
Variance Explained Ratio
VER_k = λₖ / Σ λᵢ

Choose k such that cumulative VER ≥ 90–95%.

4

Assumptions & Pitfalls

PCA is linear. Only captures linear correlations. For non-linear structure, use t-SNE or UMAP for visualisation.
Loses interpretability. PC1 is "some combination of all original features". Original feature names are lost.
Sensitive to scaling. Always standardise before PCA. A feature with larger variance will dominate without scaling.
5

When to Use

Strengths

  • Reduces overfitting in high-dimensional data
  • Speeds up downstream models
  • Removes multicollinearity
  • Enables 2D/3D visualisation of any dataset

Limitations

  • Components are uninterpretable
  • Only captures linear variance
  • Discards some information
  • Requires standardisation first
Image compression NLP dimensionality reduction Genomics Noise reduction

Key Takeaways

Eigenvectors = Directions of Variance

PCA rotates the coordinate system so PC1 points along the direction of maximum spread in the data.

Choose k from Scree Plot

Plot cumulative variance explained vs k. Retain enough components to explain 90–95% of variance.

Visualise with 2 Components

Project any dataset to 2D with PCA for a quick visual inspection of cluster structure before modelling.