← Hub
0 1 threshold
Topic 03 · Phase 1 · Interactive

Will the Customer Churn?

Logistic Regression — turning a linear model into a probability machine

1

The Problem

A telecom company wants to predict which customers will cancel their subscription next month. The target is binary: churn (1) or stay (0). Linear regression would give probabilities outside [0,1] — we need a bounded output.

What we're trying to solve

  • Output a probability between 0 and 1
  • Draw a decision boundary in feature space
  • Handle binary (or multi-class) targets
  • Measure performance with log-loss, AUC, F1
Churned — 15% of customers
Stayed — 85% of customers
Class imbalance — a key challenge
"We don't predict churn — we estimate the probability of churn, then decide where to draw the line."
2

The Intuition

Logistic regression squashes the output of a linear model through the sigmoid function, which maps any real number to (0,1). Think of it as linear regression with a probability wrapper.

Sigmoid σ(z) = 1/(1+e⁻ᶻ) — always outputs between 0 and 1

Interactive Demo

Adjust the classification threshold and watch the confusion matrix update. Default 0.5 isn't always optimal — lower the threshold to catch more churners (at the cost of more false alarms).

Confusion Matrix
72
TP
18
FP
28
FN
382
TN
Precision
0.800
Recall
0.720
F1 Score
0.758
3

The Math

Sigmoid
σ(z) = 1 / (1 + e⁻ᶻ)  where z = β₀ + β₁x₁ + … + βₙxₙ
Log-Loss (Binary Cross-Entropy)
L = −(1/m) Σ [yᵢ log(ŷᵢ) + (1−yᵢ) log(1−ŷᵢ)]
Decision Rule
Predict 1 if σ(z) ≥ threshold (default 0.5)
4

Assumptions & Pitfalls

Class imbalance. High accuracy on imbalanced data is misleading — a model predicting "never churn" gets 85% accuracy trivially.
Threshold ≠ 0.5. Optimise threshold on validation data using the F1 score or business cost matrix.
Non-linear boundaries. Logistic regression draws a straight decision boundary. Add polynomial features for curves.
5

When to Use

Strengths

  • Outputs calibrated probabilities
  • Interpretable coefficients (log-odds)
  • Fast to train even at scale
  • Strong baseline for binary classification

Limitations

  • Linear decision boundary only
  • Struggles with high class imbalance
  • Assumes no severe multicollinearity
  • Not suited for complex feature interactions
Churn prediction Fraud detection Medical diagnosis Credit scoring

Key Takeaways

Sigmoid Squashes to Probability

Any linear combination of features gets mapped to (0,1) — interpretable as the probability of the positive class.

Threshold is a Business Decision

Default 0.5 isn't always right. A healthcare model should recall most positives even at the cost of false alarms.

AUC-ROC Summarises All Thresholds

AUC = 1 is perfect; AUC = 0.5 is random. Use it to compare models independent of threshold.