Topic 12 · Phase 3 · Interactive

Compress Without Losing What Matters

PCA — find the directions of maximum variance and project your data there

↓

1

The Problem

MNIST handwritten digits have 784 features (28×28 pixels). Most pixels are correlated — adjacent pixels tend to have similar values. Can we represent each image with 50 numbers instead of 784, while keeping most of the useful information?

784 features

High redundancy — most are correlated

50 components

94% variance retained · 15× compression

"PCA doesn't select features — it invents new ones that are linear combinations of the originals, maximally spread out."

2

The Intuition

Imagine a long, thin cloud of points tilted at 45°. The longest direction through the cloud captures most of the spread — that's PC1. The perpendicular direction captures the remaining spread — that's PC2. Project all points onto PC1 and you've compressed from 2D to 1D with minimal information loss.

1

Standardise features (zero mean, unit variance)

2

Compute covariance matrix XᵀX/n

3

Find eigenvectors (principal components) and eigenvalues (variance explained)

4

Project data onto top k eigenvectors

3

The Math

Covariance Matrix

Σ = (1/n) Xᵀ X (X is zero-mean)

Eigendecomposition

Σ v = λ v → v₁ = PC1 direction, λ₁ = variance explained

Variance Explained Ratio

VER_k = λₖ / Σ λᵢ

Choose k such that cumulative VER ≥ 90–95%.

4

Assumptions & Pitfalls

PCA is linear. Only captures linear correlations. For non-linear structure, use t-SNE or UMAP for visualisation.

Loses interpretability. PC1 is "some combination of all original features". Original feature names are lost.

Sensitive to scaling. Always standardise before PCA. A feature with larger variance will dominate without scaling.

5

When to Use

Strengths

Reduces overfitting in high-dimensional data
Speeds up downstream models
Removes multicollinearity
Enables 2D/3D visualisation of any dataset

Limitations

Components are uninterpretable
Only captures linear variance
Discards some information
Requires standardisation first

Image compression NLP dimensionality reduction Genomics Noise reduction

Key Takeaways

Eigenvectors = Directions of Variance

PCA rotates the coordinate system so PC1 points along the direction of maximum spread in the data.

Choose k from Scree Plot

Plot cumulative variance explained vs k. Retain enough components to explain 90–95% of variance.

Visualise with 2 Components

Project any dataset to 2D with PCA for a quick visual inspection of cluster structure before modelling.