Topic 05 · Phase 1 · Animated

When Lines Aren't Enough

Advanced Regression — polynomial features, interactions, and non-linear fits

↓

1

The Problem

A dataset of salary vs years of experience doesn't follow a straight line — early-career growth is steep, senior growth plateaus. A linear model underfits. We need curved fitting.

Linear fit

Misses the curve — underfits both ends

Polynomial fit

Captures curvature while staying interpretable

"Adding x² to your features doesn't make the model non-linear — it still uses OLS. Only the feature engineering is non-linear."

2

The Intuition

Polynomial regression is still linear regression — we just engineer new features from existing ones. Add x² and x³ as features, then fit a straight line in that higher-dimensional space.

Degree 1 → straight line

ŷ = β₀ + β₁x

Degree 2 → parabola

ŷ = β₀ + β₁x + β₂x²

Degree 10 → likely overfit

ŷ = β₀ + β₁x + … + β₁₀x¹⁰ ← danger zone

3

The Math

Feature Transformation

[x] → [1, x, x², x³, …, xᵈ] then standard OLS

Interaction Terms

ŷ = β₀ + β₁x₁ + β₂x₂ + β₃x₁x₂

The x₁x₂ term captures how the effect of x₁ depends on x₂.

Bias-Variance Tradeoff

As degree ↑: bias ↓, variance ↑. The optimal degree balances both — find it with cross-validation.

4

Assumptions & Pitfalls

Overfitting explodes with high degree. Degree ≥ 5 usually overfits unless you regularise.

Extrapolation is dangerous. Polynomial curves go wild outside the training data range.

Multicollinearity of polynomial features. x, x², x³ are highly correlated. Always pair with Ridge regularisation.

5

When to Use

Strengths

Captures curved relationships
Still interpretable (linear in parameters)
No new algorithm — reuse OLS
Interaction terms model dependencies

Limitations

Overfits at high degree
Poor extrapolation
Feature space explodes with multivariate data
Requires regularisation

Salary vs experience Drug dose-response Physics-inspired features

Key Takeaways

Features, Not Algorithm

Polynomial regression is linear regression with engineered features. The learning algorithm stays the same.

Degree is a Hyperparameter

Tune degree via cross-validation. Pair with Ridge regularisation to avoid overfitting at degrees 3+.

Interactions Capture Dependencies

Sometimes x₁ only matters when x₂ is high. Interaction terms like x₁×x₂ encode this dependency explicitly.