Hypergeometric Distribution Calculator

Find exact hypergeometric probabilities, cumulative CDF, mean, variance, and a full distribution table for sampling without replacement.

🃏 Hypergeometric Distribution Calculator
Population size N52
1200
Success states in population K13
0100
Sample size n (draws without replacement)5
1100
Target successes k2
050
Population size N20
1200
Success states in population K8
0100
Sample size n5
1100
P(X = k): Exact Probability
P(X ≤ k): Cumulative
P(X ≥ k): Upper Tail
Mean (nK/N)
Variance
Standard Deviation
Mean (nK/N)
Standard Deviation

🃏 What is the Hypergeometric Distribution?

The hypergeometric distribution gives the probability of obtaining exactly k successes in n draws from a finite population of N items that contains exactly K success items, when sampling is done without replacement. The key phrase is "without replacement": once you draw an item from the population, it is not returned before the next draw, so the probability of success changes slightly with each draw. This distinguishes the hypergeometric from the binomial distribution, where each trial is independent because sampling is done with replacement (or because the population is infinite).

Real-world applications appear in every field where selection happens from a fixed, finite pool. In quality control, an inspector draws 20 units from a batch of 200 and counts defectives: the hypergeometric distribution gives the exact probability of finding k defects. In card games, a poker player wants to know the probability of drawing exactly 2 aces in a 5-card hand dealt from a standard deck of 52 (K = 4 aces, N = 52, n = 5). In clinical trials, a researcher selects 30 patients from a pool of 100, 40 of whom carry a genetic marker: the probability that exactly 15 selected patients carry the marker follows a hypergeometric distribution. In audit sampling, an auditor examines m records from N total to estimate the error rate.

A common misconception is that the hypergeometric and binomial distributions are always interchangeable. They are only similar when the population is large relative to the sample (specifically, when n/N is less than about 5%). When the sampling fraction n/N is larger, using the binomial overestimates the variance because it ignores the finite population correction factor (N-n)/(N-1). The hypergeometric variance is always smaller than the binomial variance with the same mean, because removing items from the population reduces uncertainty about what remains.

The valid range of successes k is not always 0 to n. The lower bound is max(0, n + K - N): if the population has more successes than non-successes, some minimum number of successes is forced into any large enough sample. The upper bound is min(K, n): you cannot draw more successes than either the total available (K) or the total drawn (n). Always check this range before interpreting probabilities.

📐 Formula

P(X = k)  =  C(K, k) × C(N−K, n−k) ÷ C(N, n)
N = population size (total number of items)
K = number of success states in the population
n = sample size (items drawn without replacement)
k = number of successes observed in the sample (max(0, n+K-N) ≤ k ≤ min(K, n))
C(a, b) = a! ÷ (b! × (a−b)!) = number of ways to choose b from a
Mean: μ = n × K ÷ N
Variance: σ² = n × (K/N) × ((N−K)/N) × (N−n)/(N−1)
Example: N=52, K=13, n=5, k=2: P(X=2) = C(13,2)×C(39,3)/C(52,5) = 78×9139/2598960 ≈ 27.44%

📖 How to Use This Calculator

Steps

1
Choose a mode. Select "Calculate Probability" for a specific k value, or "Distribution Table" to see all valid probabilities at once for given N, K, and n.
2
Enter population parameters N and K. Set N to the total population size and K to the number of success items in the population. For a standard deck of cards with hearts as success, N = 52 and K = 13.
3
Enter the sample size n and target k. Set n to the number of items drawn and k to the number of successes you want the probability for. The calculator validates that k is in the allowable range.
4
Read the results. The calculator shows P(X = k), the cumulative P(X at most k), upper tail P(X at least k), mean nK/N, variance, and standard deviation.

💡 Example Calculations

Example 1: Poker Hand (N=52, K=4 Aces, n=5, k=2)

What is the probability of being dealt exactly 2 aces in a 5-card hand from a standard 52-card deck?

1
N = 52 (cards), K = 4 (aces), n = 5 (cards dealt), k = 2 (aces wanted).
2
P(X = 2) = C(4,2) times C(48,3) divided by C(52,5) = 6 times 17296 divided by 2598960.
3
P(X = 2) = 103776 / 2598960 = 0.03993, or about 3.99%. The mean number of aces in a 5-card hand is 5 times 4/52 = 0.385.
P(X = 2) = 3.99% | Mean = 0.3846 aces
Try this example →

Example 2: Quality Control (N=100, K=10 defective, n=15, k=0)

A batch of 100 units contains 10 defective items. An inspector samples 15 units. What is the probability of finding no defects?

1
N = 100, K = 10 (defective), n = 15 (sampled), k = 0 (target defects found).
2
P(X = 0) = C(10,0) times C(90,15) divided by C(100,15) = 1 times C(90,15) / C(100,15).
3
P(X = 0) = approximately 18.56%. The mean number of defects in the sample is 15 times 10/100 = 1.5.
P(X = 0) = ~18.56% | Mean = 1.5 defects
Try this example →

Example 3: Voter Survey (N=200, K=80 supporters, n=20, k=10)

A town of 200 voters has 80 who support a ballot measure. If 20 voters are randomly surveyed without replacement, what is the probability exactly 10 support the measure?

1
N = 200, K = 80 (supporters), n = 20 (surveyed), k = 10. The support rate is 80/200 = 40%.
2
P(X = 10) = C(80,10) times C(120,10) divided by C(200,20). This uses the log-binomial approach for numerical accuracy.
3
P(X = 10) is approximately 8.70%. The mean is 20 times 80/200 = 8 supporters. Finding exactly 10 is 2 above the mean.
P(X = 10) = ~8.70% | Mean = 8 supporters
Try this example →

❓ Frequently Asked Questions

What is the hypergeometric distribution and when is it used?+
The hypergeometric distribution gives the probability of k successes in a sample of n items drawn without replacement from a population of N items containing K successes. It is used whenever selection is done without replacement from a known finite population: card games, quality control audits, clinical trial enrollment, lottery draws, and wildlife capture-recapture studies all use the hypergeometric model.
What is the hypergeometric distribution PMF formula?+
P(X = k) = C(K, k) times C(N-K, n-k) divided by C(N, n). C(K, k) counts ways to choose k successes from K available. C(N-K, n-k) counts ways to choose the remaining n-k items from the N-K failures. C(N, n) is the total number of ways to choose n items from N, which is the denominator of the probability.
How is the hypergeometric distribution different from the binomial?+
The binomial assumes independent trials with constant probability p (sampling with replacement or infinite population). The hypergeometric models dependent trials without replacement from a finite population, where each draw changes the remaining composition. The hypergeometric variance includes a finite population correction factor (N-n)/(N-1) that is always less than 1, making it smaller than the corresponding binomial variance.
What is the mean and variance of the hypergeometric distribution?+
The mean is mu = n times K/N. The variance is sigma^2 = n times (K/N) times ((N-K)/N) times (N-n)/(N-1). For N = 100, K = 30, n = 10: mean = 3, variance = 3 times 0.7 times 0.909 = 1.909, std dev = 1.382. Compare to binomial: mean = 3, variance = 2.1 (larger because it ignores finite population).
What is the valid range of k in the hypergeometric distribution?+
k ranges from max(0, n+K-N) to min(K, n). The lower bound max(0, n+K-N) ensures you cannot have fewer successes than required (if you must draw from a pool where non-successes are limited). The upper bound min(K, n) ensures you cannot observe more successes than are available or more than the sample size allows.
When does the hypergeometric distribution approximate the binomial?+
When the population is large relative to the sample (n/N less than 5%), the hypergeometric distribution closely approximates the binomial distribution with p = K/N. The finite population correction factor (N-n)/(N-1) approaches 1, so the variances agree. Many textbooks use the 10% rule: if n/N is less than 10%, the binomial approximation is acceptable.
How do I calculate hypergeometric probability for large N?+
For large N, direct computation of C(N, n) overflows standard floating-point. This calculator uses log-space arithmetic: log P(X = k) = log C(K, k) + log C(N-K, n-k) - log C(N, n), where each log-binomial is computed via log-factorials. The result is then exponentiated. This approach handles N up to 10,000 accurately.
What is the cumulative hypergeometric probability?+
P(X at most k) is the sum of P(X = i) for all valid i from the lower bound up to k. This CDF gives the probability of observing k or fewer successes. For example, in a sample of 5 cards from a deck (N=52, K=13 hearts), P(X at most 1 heart) = P(X=0) + P(X=1) is the probability of seeing at most 1 heart.
What is the Fisher exact test and how does it relate to the hypergeometric distribution?+
Fisher's exact test uses the hypergeometric distribution to test whether two groups have the same proportion of successes. Given a 2x2 contingency table with fixed row and column totals, the hypergeometric PMF gives the probability of the observed cell counts. The p-value is the sum of hypergeometric probabilities for all tables as or more extreme than the observed one.
Can the hypergeometric distribution model the capture-recapture method?+
Yes. In ecology, researchers capture M animals, tag them, and release them. Later they capture n animals and count k tagged ones. Since K = M tagged animals are among N total, the number of tagged animals in the second sample follows a hypergeometric distribution. This lets researchers estimate N (total population size) from the observed recapture rate k/n.
What is the mode of the hypergeometric distribution?+
The mode is the integer floor of (n+1)(K+1)/(N+2), which equals either floor((n+1)(K+1)/(N+2)) or floor((n+1)(K+1)/(N+2)) minus 1. For the card example (N=52, K=13, n=5), the mode is floor(6 times 14/54) = floor(1.556) = 1, meaning exactly 1 heart is the most likely outcome in a 5-card hand.