Naive Bayes — probabilistic classification using Bayes' theorem
A news aggregator wants to classify articles into categories (politics, sports, tech) based on the words they contain. The feature space is huge (vocabulary = 50,000 words) but most models are too slow.
Start with a prior belief (base rate of each class). Then update that belief as you observe each word in the document. Each word shifts the probability up or down. After all words, pick the most probable class.
P(C|X) = P(X|C) · P(C) / P(X)P(X|C) = Π P(xᵢ|C) (features are conditionally independent)ŷ = argmax_c [ log P(C) + Σᵢ log P(xᵢ|C) ]Bayes' theorem turns base rates and observed evidence into an updated belief. Each word is evidence that shifts the probability.
The independence assumption is wrong — but for many text tasks, Naive Bayes rivals much more complex models.
Add-1 smoothing prevents zero-probability collapse for unseen words. A tiny constant that makes the model robust.