← Hub
Topic 05 · Phase 1 · Animated

When Lines Aren't Enough

Advanced Regression — polynomial features, interactions, and non-linear fits

1

The Problem

A dataset of salary vs years of experience doesn't follow a straight line — early-career growth is steep, senior growth plateaus. A linear model underfits. We need curved fitting.

Linear fit
Misses the curve — underfits both ends
Polynomial fit
Captures curvature while staying interpretable
"Adding x² to your features doesn't make the model non-linear — it still uses OLS. Only the feature engineering is non-linear."
2

The Intuition

Polynomial regression is still linear regression — we just engineer new features from existing ones. Add x² and x³ as features, then fit a straight line in that higher-dimensional space.

Degree 1 → straight line
ŷ = β₀ + β₁x
Degree 2 → parabola
ŷ = β₀ + β₁x + β₂x²
Degree 10 → likely overfit
ŷ = β₀ + β₁x + … + β₁₀x¹⁰ ← danger zone
3

The Math

Feature Transformation
[x] → [1, x, x², x³, …, xᵈ] then standard OLS
Interaction Terms
ŷ = β₀ + β₁x₁ + β₂x₂ + β₃x₁x₂

The x₁x₂ term captures how the effect of x₁ depends on x₂.

Bias-Variance Tradeoff

As degree ↑: bias ↓, variance ↑. The optimal degree balances both — find it with cross-validation.

4

Assumptions & Pitfalls

Overfitting explodes with high degree. Degree ≥ 5 usually overfits unless you regularise.
Extrapolation is dangerous. Polynomial curves go wild outside the training data range.
Multicollinearity of polynomial features. x, x², x³ are highly correlated. Always pair with Ridge regularisation.
5

When to Use

Strengths

  • Captures curved relationships
  • Still interpretable (linear in parameters)
  • No new algorithm — reuse OLS
  • Interaction terms model dependencies

Limitations

  • Overfits at high degree
  • Poor extrapolation
  • Feature space explodes with multivariate data
  • Requires regularisation
Salary vs experience Drug dose-response Physics-inspired features

Key Takeaways

Features, Not Algorithm

Polynomial regression is linear regression with engineered features. The learning algorithm stays the same.

Degree is a Hyperparameter

Tune degree via cross-validation. Pair with Ridge regularisation to avoid overfitting at degrees 3+.

Interactions Capture Dependencies

Sometimes x₁ only matters when x₂ is high. Interaction terms like x₁×x₂ encode this dependency explicitly.