Hypergeometric Distribution Calculator
Find exact hypergeometric probabilities, cumulative CDF, mean, variance, and a full distribution table for sampling without replacement.
🃏 What is the Hypergeometric Distribution?
The hypergeometric distribution gives the probability of obtaining exactly k successes in n draws from a finite population of N items that contains exactly K success items, when sampling is done without replacement. The key phrase is "without replacement": once you draw an item from the population, it is not returned before the next draw, so the probability of success changes slightly with each draw. This distinguishes the hypergeometric from the binomial distribution, where each trial is independent because sampling is done with replacement (or because the population is infinite).
Real-world applications appear in every field where selection happens from a fixed, finite pool. In quality control, an inspector draws 20 units from a batch of 200 and counts defectives: the hypergeometric distribution gives the exact probability of finding k defects. In card games, a poker player wants to know the probability of drawing exactly 2 aces in a 5-card hand dealt from a standard deck of 52 (K = 4 aces, N = 52, n = 5). In clinical trials, a researcher selects 30 patients from a pool of 100, 40 of whom carry a genetic marker: the probability that exactly 15 selected patients carry the marker follows a hypergeometric distribution. In audit sampling, an auditor examines m records from N total to estimate the error rate.
A common misconception is that the hypergeometric and binomial distributions are always interchangeable. They are only similar when the population is large relative to the sample (specifically, when n/N is less than about 5%). When the sampling fraction n/N is larger, using the binomial overestimates the variance because it ignores the finite population correction factor (N-n)/(N-1). The hypergeometric variance is always smaller than the binomial variance with the same mean, because removing items from the population reduces uncertainty about what remains.
The valid range of successes k is not always 0 to n. The lower bound is max(0, n + K - N): if the population has more successes than non-successes, some minimum number of successes is forced into any large enough sample. The upper bound is min(K, n): you cannot draw more successes than either the total available (K) or the total drawn (n). Always check this range before interpreting probabilities.