Logistic Regression — turning a linear model into a probability machine
A telecom company wants to predict which customers will cancel their subscription next month. The target is binary: churn (1) or stay (0). Linear regression would give probabilities outside [0,1] — we need a bounded output.
Logistic regression squashes the output of a linear model through the sigmoid function, which maps any real number to (0,1). Think of it as linear regression with a probability wrapper.
Sigmoid σ(z) = 1/(1+e⁻ᶻ) — always outputs between 0 and 1
Adjust the classification threshold and watch the confusion matrix update. Default 0.5 isn't always optimal — lower the threshold to catch more churners (at the cost of more false alarms).
σ(z) = 1 / (1 + e⁻ᶻ) where z = β₀ + β₁x₁ + … + βₙxₙ
L = −(1/m) Σ [yᵢ log(ŷᵢ) + (1−yᵢ) log(1−ŷᵢ)]
Predict 1 if σ(z) ≥ threshold (default 0.5)
Any linear combination of features gets mapped to (0,1) — interpretable as the probability of the positive class.
Default 0.5 isn't always right. A healthcare model should recall most positives even at the cost of false alarms.
AUC = 1 is perfect; AUC = 0.5 is random. Use it to compare models independent of threshold.