Topic 02 · Phase 1 · Interactive

Find the Line of Best Fit

Linear Regression — the bedrock of supervised learning

↓

1

The Problem

A real-estate firm wants to predict house prices from size, location, and age. The relationship looks roughly linear — can we find the best straight line through the data?

What we're trying to solve

Predict a continuous target variable (price)
Quantify the relationship between features and target
Minimise prediction errors across all training examples
Explain how much each feature contributes

House Size vs Price

"Give me a dataset and I'll tell you what straight line explains it best." — The OLS promise

2

The Intuition

Imagine stretching a rubber band through a cloud of points. The band settles where it's least "pulled" by all the points simultaneously. That settling position is the regression line — it minimises the sum of squared vertical distances to every point.

Residual

Vertical gap between actual point and predicted line: ε = y − ŷ

OLS Goal

Find β₀, β₁ that minimise Σ(yᵢ − ŷᵢ)²

Interactive Demo

Live · Click to add points · Drag to move

0

Points

—

R²

—

Slope (β₁)

—

Intercept (β₀)

Click canvas to add points. Drag any point to update the regression line and R² in real time.

3

The Math

Model

ŷ = β₀ + β₁x₁ + β₂x₂ + … + βₙxₙ

Cost Function (MSE)

J(β) = (1/2m) Σᵢ (ŷᵢ − yᵢ)²

OLS Closed-form Solution

β = (XᵀX)⁻¹ Xᵀy

R² (Coefficient of Determination)

R² = 1 − SSres/SStot | SSres = Σ(yᵢ−ŷᵢ)² | SStot = Σ(yᵢ−ȳ)²

4

Assumptions & Pitfalls

Linearity. The relationship must be linear. Curved patterns need polynomial features or a different model.

Multicollinearity. Highly correlated predictors make coefficient estimates unstable. Check VIF.

Homoscedasticity. Residual variance should be constant. Fan-shaped residual plots signal trouble.

Outliers inflate MSE. A single extreme point can drag the line hard. Consider robust regression or Huber loss.

5

When to Use

Strengths

Highly interpretable coefficients
Fast to train (closed-form solution)
Strong baseline for regression tasks
Works well with many features if regularised

Limitations

Assumes linearity — bad on curved data
Sensitive to outliers (MSE loss)
Needs feature scaling for gradient descent
Struggles with high-dimensional interactions

House price prediction Sales forecasting Risk scoring Demand estimation

Key Takeaways

Minimise the Residuals

OLS finds the unique line that minimises the sum of squared errors — no iteration required for small datasets.

R² Explains Variance

R² = 0.85 means 85% of variance in the target is explained by your features. The remaining 15% is noise or missing features.

Check Your Residuals

Always plot residuals vs fitted values. Random scatter = good. Patterns = your linear assumption is violated.