Topic 03 · Phase 1 · Interactive

Will the Customer Churn?

Logistic Regression — turning a linear model into a probability machine

↓

1

The Problem

A telecom company wants to predict which customers will cancel their subscription next month. The target is binary: churn (1) or stay (0). Linear regression would give probabilities outside [0,1] — we need a bounded output.

What we're trying to solve

Output a probability between 0 and 1
Draw a decision boundary in feature space
Handle binary (or multi-class) targets
Measure performance with log-loss, AUC, F1

Churned — 15% of customers

Stayed — 85% of customers

Class imbalance — a key challenge

"We don't predict churn — we estimate the probability of churn, then decide where to draw the line."

2

The Intuition

Logistic regression squashes the output of a linear model through the sigmoid function, which maps any real number to (0,1). Think of it as linear regression with a probability wrapper.

Sigmoid σ(z) = 1/(1+e⁻ᶻ) — always outputs between 0 and 1

Interactive Demo

Adjust the classification threshold and watch the confusion matrix update. Default 0.5 isn't always optimal — lower the threshold to catch more churners (at the cost of more false alarms).

Decision Threshold: 0.50

Confusion Matrix

72

TP

18

FP

28

FN

382

TN

Precision

0.800

Recall

0.720

F1 Score

0.758

3

The Math

Sigmoid

σ(z) = 1 / (1 + e⁻ᶻ) where z = β₀ + β₁x₁ + … + βₙxₙ

Log-Loss (Binary Cross-Entropy)

L = −(1/m) Σ [yᵢ log(ŷᵢ) + (1−yᵢ) log(1−ŷᵢ)]

Decision Rule

Predict 1 if σ(z) ≥ threshold (default 0.5)

4

Assumptions & Pitfalls

Class imbalance. High accuracy on imbalanced data is misleading — a model predicting "never churn" gets 85% accuracy trivially.

Threshold ≠ 0.5. Optimise threshold on validation data using the F1 score or business cost matrix.

Non-linear boundaries. Logistic regression draws a straight decision boundary. Add polynomial features for curves.

5

When to Use

Strengths

Outputs calibrated probabilities
Interpretable coefficients (log-odds)
Fast to train even at scale
Strong baseline for binary classification

Limitations

Linear decision boundary only
Struggles with high class imbalance
Assumes no severe multicollinearity
Not suited for complex feature interactions

Churn prediction Fraud detection Medical diagnosis Credit scoring

Key Takeaways

Sigmoid Squashes to Probability

Any linear combination of features gets mapped to (0,1) — interpretable as the probability of the positive class.

Threshold is a Business Decision

Default 0.5 isn't always right. A healthcare model should recall most positives even at the cost of false alarms.

AUC-ROC Summarises All Thresholds

AUC = 1 is perfect; AUC = 0.5 is random. Use it to compare models independent of threshold.