Home / Scientific / Bayes' Theorem Calculator

Bayes' Theorem Calculator

Calculate the posterior probability of an event using Bayes' Theorem. Enter prior probability, sensitivity (true positive rate), and false positive rate to find the true probability given a positive result.

Probabilities

Prior P(A)

P(B | A)

Calculate P(B) using:

P(B | not A)

Posterior Probability

Updated Chance: P(A | B)16.667%

Email Link Text/SMS WhatsApp

Bayes' Theorem

Bayes' Theorem provides a way to revise existing predictions or theories (update probabilities) given new or additional evidence. It is foundational in statistics, machine learning, and medical testing.

The Formula

P(A|B) = [P(B|A) × P(A)] / P(B)

Where P(B) can be expanded using the law of total probability:
P(B) = P(B|A)P(A) + P(B|not A)P(not A)

Variables Explained

P(A) (Prior Probability): The initial probability of event A occurring before new evidence is considered.
P(B|A) (True Positive Rate): The probability of observing evidence B given that A is true.
P(B) (Marginal Probability): The total probability of observing evidence B under all possible scenarios.
P(B|not A) (False Positive Rate): The probability of observing evidence B even when A is false.
P(A|B) (Posterior Probability): The updated probability of event A given that evidence B has been observed.

What is Bayes' Theorem: Updating Beliefs with New Evidence?

Bayes' Theorem, published posthumously by Reverend Thomas Bayes in 1763 and formalized by Pierre-Simon Laplace, provides the mathematical framework for updating a prior belief in light of new evidence. It answers the fundamental question: 'Given that I observed this result, what is the probability that the underlying condition is true?' The theorem is the foundation of Bayesian inference, spam filters, medical diagnostics, AI pattern recognition, and scientific hypothesis testing.

Mathematical Foundation

Bayes' Theorem (Posterior Probability)

P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

P(A|B)

= Posterior probability — the probability that A is true, given that B has been observed. This is what you want to find.

P(B|A)

= Likelihood — the probability of observing B if A is true. In medical testing: the sensitivity (true positive rate).

P(A)

= Prior probability — your initial belief in A before observing B. In medical screening: the disease prevalence in the tested population.

P(B)

= Marginal probability (evidence) — the total probability of observing B regardless of A. Expanded form below.

Expanded Form (Total Probability Denominator)

P(B) = P(B|A) \cdot P(A) + P(B|\neg A) \cdot P(\neg A)

P(B|¬A)

= False positive rate — the probability of observing B when A is actually false. Equal to (1 − specificity) in medical testing.

P(¬A)

= Complement of the prior: 1 − P(A). The probability that A is NOT true.

Laws & Principles

The Base Rate Fallacy: The single most consequential mistake in probabilistic reasoning — ignoring P(A), the prior probability. If a disease affects 1 in 10,000 people (0.01% prevalence), even a 99% accurate test will produce more false positives than true positives on a random population screening. With 99% sensitivity and 99% specificity: testing 10,000 people yields 1 true positive and ~100 false positives — meaning a positive result is only 1% likely to be a true case. This is why mammography and PSA screening are controversial: in low-prevalence populations, the positive predictive value (Bayesian posterior) is shockingly low.
Sensitivity vs Specificity: Sensitivity = P(Test+|Disease+) — the test's true positive rate; how well it catches real cases. Specificity = P(Test−|Disease−) — the true negative rate; how well it excludes healthy people. The false positive rate = 1 − Specificity. For Bayes' theorem: P(B|A) = Sensitivity; P(B|¬A) = 1 − Specificity. A test can be highly sensitive (catching most cases) yet have poor positive predictive value if the disease is rare — because specificity matters disproportionately when prevalence is low.
Sequential Bayesian Updating: The posterior from one Bayes calculation becomes the prior for the next. This is the core of Bayesian inference. A second independent positive test on a patient who already tested positive dramatically increases the posterior, because the new prior is much higher than the population prevalence. Example: posterior of 9% after test 1 becomes the new prior for test 2 — if test 2 is also positive, the updated posterior may reach 50–80% depending on test accuracy. Sequential updating is why repeated confirmatory testing is so diagnostic-value-efficient.

Step-by-Step Example Walkthrough

" A patient tests positive for a disease with a prevalence of 1% in the population. The test has 90% sensitivity and 95% specificity. "

Prior: P(Disease) = 0.01; P(No Disease) = 0.99.
Likelihood: P(Test+|Disease) = 0.90 (sensitivity); P(Test+|No Disease) = 0.05 (false positive rate = 1 − 0.95 specificity).
Marginal probability P(Test+): (0.90 × 0.01) + (0.05 × 0.99) = 0.009 + 0.0495 = 0.0585.
Posterior: P(Disease|Test+) = (0.90 × 0.01) / 0.0585 = 0.009 / 0.0585 ≈ 0.1538.

Final Result: Despite a positive test result, the probability of actually having the disease is only about 15.4%. Out of every 1,000 positive tests in this population, ~846 are false positives and only ~154 are true cases. This is the base rate fallacy in action — the low 1% prevalence dominates the outcome even with a 90%/95% test.

Quick Answer: What does Bayes' Theorem actually calculate?

Bayes' Theorem calculates the posterior probability — the updated probability that a condition is true after accounting for new evidence. The formula: P(A|B) = P(B|A) × P(A) ÷ P(B). The counterintuitive result that shocks most people: a positive result on a highly accurate test does not mean you probably have the disease if the disease is rare. Example: 1% disease prevalence, 90% sensitivity, 95% specificity → P(Disease | Positive Test) = 15.4%. Even though the test is 90% accurate, 84.6% of positive results in a low-prevalence population are false positives. This is why public health agencies do not recommend mass screening for rare diseases with imperfect tests — the false positive avalanche causes more harm (unnecessary treatment, anxiety, follow-up procedures) than the true positives saved.

Base Rate Effect on Positive Predictive Value (PPV)

All rows use a test with 90% sensitivity and 95% specificity. Only the disease prevalence changes:

Disease Prevalence	True Positives per 100k	False Positives per 100k	PPV (Bayesian Posterior)	Verdict
0.01% (1 in 10,000)	9	4,999	0.18%	Mass screening harmful — overwhelming false positives
0.1% (1 in 1,000)	90	4,995	1.77%	Not suitable for general screening
1% (high-risk group)	900	4,950	15.4%	Confirmatory test recommended on positives
5% (symptomatic patients)	4,500	4,750	48.6%	Near coin-flip — needs clinical correlation
20% (symptomatic + risk factors)	18,000	4,000	81.8%	Strong positive evidence; treatment-level confidence
50% (confirmed exposure)	45,000	2,500	94.7%	High confidence; false positive unlikely

Pro Tips & Bayesian Reasoning Errors

Do This

✓Always identify your prior (P(A)) before interpreting any test result — the prior is the most impactful input and is most often ignored. In clinical medicine, the prior is the pre-test probability: the likelihood of disease given the patient's presentation, risk factors, and population. A prior of 5% (symptomatic patient, moderate risk) and a prior of 0.01% (random population screening) produce drastically different positive predictive values from identical tests. In statistical inference, this is called “motivated prior” selection — the posterior is only as valid as the prior is accurate. Sources for priors include: epidemiological prevalence data, clinical decision rules (Wells Score, HEART Score), base rates from published studies, or prior test results (sequential updating).
✓Use sequential Bayesian updating when multiple independent tests are available — the posterior from test 1 becomes the prior for test 2. If a patient tests positive on a rapid antigen test (posterior = 15%), and then tests positive on a PCR confirmatory test (sensitivity 99%, specificity 99%), re-run Bayes with prior = 0.154: posterior jumps to ~78%. A third positive independent test would push it above 95%. This is mathematically equivalent to running Bayes once with all three tests combined (if independent), but the sequential form is more intuitive and allows for test-specific sensitivity/specificity values.

Avoid This

✗Don't confuse P(B|A) with P(A|B) — this is called the “Prosecutor's Fallacy” and has sent innocent people to prison. P(B|A) is the probability of observing the evidence B given hypothesis A is true (e.g., probability of matching DNA if the suspect is the perpetrator). P(A|B) is the probability the suspect is the perpetrator given the DNA match. These are not the same. Claiming “there's a 1-in-a-million chance of a random DNA match, therefore the probability of innocence is 1-in-a-million” confuses these two quantities. Without accounting for the prior (how many people could plausibly be the perpetrator), the posterior probability of guilt is not deducible from the likelihood alone. This fallacy was central to the O.J. Simpson trial and multiple wrongful convictions.
✗Don't use Bayes' theorem with non-independent tests as if they were independent sequential updates — the posterior will be falsely inflated. Sequential Bayesian updating is valid only when test 2 provides independent new evidence beyond test 1. If two tests use the same biological mechanism (e.g., two ELISA tests measuring the same antibody from the same blood draw), they are not independent — their results are correlated by the same underlying measurement error. Treating them as independent updates overcounts the evidence. True independence requires tests that measure different signals: e.g., antigen test + antibody test + PCR all measure different aspects of infection and can be treated as approximately independent for sequential updating.

Frequently Asked Questions

What is the difference between sensitivity, specificity, PPV, and NPV?

Sensitivity (True Positive Rate) = P(Test+ | Disease+): the probability a sick person tests positive. A 90% sensitive test misses 10% of true cases. Sensitivity is a fixed property of the test, independent of prevalence. Specificity (True Negative Rate) = P(Test− | Disease−): the probability a healthy person tests negative. A 95% specific test generates a 5% false positive rate. Also fixed. Positive Predictive Value (PPV) = P(Disease+ | Test+): the probability that a positive result is a true positive. This is Bayes' posterior. PPV depends on prevalence — it changes with the population screened. Negative Predictive Value (NPV) = P(Disease− | Test−): probability that a negative result is a true negative. Also prevalence-dependent. The key insight: sensitivity and specificity are test properties (fixed); PPV and NPV are population-specific outcomes (change with prevalence). This is why a test with excellent sensitivity/specificity can still have poor PPV in a low-prevalence population.

Why does a 99% accurate test only tell you 50% probability when the disease is rare?

This is the base rate fallacy in its most striking form. Imagine a disease affecting 1 in 1,000 people (0.1% prevalence) and a test that is 99% sensitive and 99% specific. Testing 100,000 people: 100 truly have the disease; 99 test positive (true positives from 99% sensitivity, 1 missed). Of the 99,900 without the disease, 999 test positive (false positives from 1% false positive rate). Total positives: 99 + 999 = 1,098. Of those, only 99 are true cases. PPV = 99/1,098 ≈ 9% — a positive result is 91% likely to be wrong! The 1% false positive rate, applied to 99,900 healthy people, generates far more false positives than the 0.1% prevalence rate generates true cases. The only way to improve PPV without changing the test is to test higher-prevalence populations (symptomatic patients, exposed contacts) where the prior is higher.

How is Bayes' Theorem used in AI and machine learning?

Bayes' theorem is embedded in multiple foundational ML techniques: (1) Naïve Bayes classifiers — the earliest spam filters calculated P(Spam | word “Viagra”) using Bayes' theorem with word frequencies as likelihoods and the prior probability of spam. Still used in text classification for its speed and interpretability. (2) Bayesian neural networks — instead of point estimates for weights, maintain probability distributions, enabling uncertainty quantification in predictions. (3) Bayesian hyperparameter optimization (Gaussian processes) — treats the loss function as an unknown function and uses Bayes to update beliefs about which hyperparameters are optimal, efficiently exploring the search space. (4) Probabilistic graphical models (Bayesian networks / belief networks) — encode conditional dependencies between variables using directed acyclic graphs where each node's CPT is a Bayesian conditional probability. Used in medical diagnosis, risk assessment, and causal inference.

What is the difference between Bayesian and frequentist probability?

Frequentist probability defines probability as the long-run frequency of outcomes in infinitely repeated experiments. It produces p-values, confidence intervals, and null hypothesis significance testing (NHST). A 95% CI means: “If we repeated this experiment 100 times, 95 of the resulting intervals would contain the true value.” The parameter itself is treated as fixed (not a random variable). Bayesian probability treats probability as a degree of belief that can be assigned to any uncertain event, including one-time events. Parameters are random variables with probability distributions. Bayes' theorem updates a prior belief to a posterior belief given data. A 95% Credible Interval means: “Given the data, there is a 95% probability the true parameter lies within this range.” The philosophical difference: frequentism rejects prior beliefs as unscientific subjectivity; Bayesian inference embraces prior knowledge as legitimate information. In practice, Bayesian methods are more natural for decision-making (what is the probability this specific patient has cancer?), while frequentist methods dominate traditional hypothesis testing in academic research.