← Hub
Topic 02 · Phase 1 · Interactive

Find the Line of Best Fit

Linear Regression — the bedrock of supervised learning

1

The Problem

A real-estate firm wants to predict house prices from size, location, and age. The relationship looks roughly linear — can we find the best straight line through the data?

What we're trying to solve

  • Predict a continuous target variable (price)
  • Quantify the relationship between features and target
  • Minimise prediction errors across all training examples
  • Explain how much each feature contributes
House Size vs Price
Size → Price ↑
"Give me a dataset and I'll tell you what straight line explains it best." — The OLS promise
2

The Intuition

Imagine stretching a rubber band through a cloud of points. The band settles where it's least "pulled" by all the points simultaneously. That settling position is the regression line — it minimises the sum of squared vertical distances to every point.

Residual
Vertical gap between actual point and predicted line: ε = y − ŷ
OLS Goal
Find β₀, β₁ that minimise Σ(yᵢ − ŷᵢ)²

Interactive Demo

Live · Click to add points · Drag to move
0
Points
Slope (β₁)
Intercept (β₀)

Click canvas to add points. Drag any point to update the regression line and R² in real time.

3

The Math

Model
ŷ = β₀ + β₁x₁ + β₂x₂ + … + βₙxₙ
Cost Function (MSE)
J(β) = (1/2m) Σᵢ (ŷᵢ − yᵢ)²
OLS Closed-form Solution
β = (XᵀX)⁻¹ Xᵀy
R² (Coefficient of Determination)
R² = 1 − SSres/SStot  |  SSres = Σ(yᵢ−ŷᵢ)²  |  SStot = Σ(yᵢ−ȳ)²
4

Assumptions & Pitfalls

Linearity. The relationship must be linear. Curved patterns need polynomial features or a different model.
Multicollinearity. Highly correlated predictors make coefficient estimates unstable. Check VIF.
Homoscedasticity. Residual variance should be constant. Fan-shaped residual plots signal trouble.
Outliers inflate MSE. A single extreme point can drag the line hard. Consider robust regression or Huber loss.
5

When to Use

Strengths

  • Highly interpretable coefficients
  • Fast to train (closed-form solution)
  • Strong baseline for regression tasks
  • Works well with many features if regularised

Limitations

  • Assumes linearity — bad on curved data
  • Sensitive to outliers (MSE loss)
  • Needs feature scaling for gradient descent
  • Struggles with high-dimensional interactions
House price prediction Sales forecasting Risk scoring Demand estimation

Key Takeaways

Minimise the Residuals

OLS finds the unique line that minimises the sum of squared errors — no iteration required for small datasets.

R² Explains Variance

R² = 0.85 means 85% of variance in the target is explained by your features. The remaining 15% is noise or missing features.

Check Your Residuals

Always plot residuals vs fitted values. Random scatter = good. Patterns = your linear assumption is violated.