Hypergeometric Distribution Calculator

Find exact hypergeometric probabilities, cumulative CDF, mean, variance, and a full distribution table for sampling without replacement.

Population size N52

1200

Success states in population K13

0100

Sample size n (draws without replacement)5

1100

Target successes k2

050

Population size N20

1200

Success states in population K8

0100

Sample size n5

1100

P(X = k): Exact Probability

—

P(X ≤ k): Cumulative

—

P(X ≥ k): Upper Tail

—

Mean (nK/N)

—

Variance

—

Standard Deviation

—

Mean (nK/N)

—

Standard Deviation

—

🃏 What is the Hypergeometric Distribution?

The hypergeometric distribution gives the probability of obtaining exactly k successes in n draws from a finite population of N items that contains exactly K success items, when sampling is done without replacement. The key phrase is "without replacement": once you draw an item from the population, it is not returned before the next draw, so the probability of success changes slightly with each draw. This distinguishes the hypergeometric from the binomial distribution, where each trial is independent because sampling is done with replacement (or because the population is infinite).

Real-world applications appear in every field where selection happens from a fixed, finite pool. In quality control, an inspector draws 20 units from a batch of 200 and counts defectives: the hypergeometric distribution gives the exact probability of finding k defects. In card games, a poker player wants to know the probability of drawing exactly 2 aces in a 5-card hand dealt from a standard deck of 52 (K = 4 aces, N = 52, n = 5). In clinical trials, a researcher selects 30 patients from a pool of 100, 40 of whom carry a genetic marker: the probability that exactly 15 selected patients carry the marker follows a hypergeometric distribution. In audit sampling, an auditor examines m records from N total to estimate the error rate.

A common misconception is that the hypergeometric and binomial distributions are always interchangeable. They are only similar when the population is large relative to the sample (specifically, when n/N is less than about 5%). When the sampling fraction n/N is larger, using the binomial overestimates the variance because it ignores the finite population correction factor (N-n)/(N-1). The hypergeometric variance is always smaller than the binomial variance with the same mean, because removing items from the population reduces uncertainty about what remains.

The valid range of successes k is not always 0 to n. The lower bound is max(0, n + K - N): if the population has more successes than non-successes, some minimum number of successes is forced into any large enough sample. The upper bound is min(K, n): you cannot draw more successes than either the total available (K) or the total drawn (n). Always check this range before interpreting probabilities.

📐 Formula

P(X = k) = C(K, k) × C(N−K, n−k) ÷ C(N, n)

N = population size (total number of items)

K = number of success states in the population

n = sample size (items drawn without replacement)

k = number of successes observed in the sample (max(0, n+K-N) ≤ k ≤ min(K, n))

C(a, b) = a! ÷ (b! × (a−b)!) = number of ways to choose b from a

Mean: μ = n × K ÷ N

Variance: σ² = n × (K/N) × ((N−K)/N) × (N−n)/(N−1)

Example: N=52, K=13, n=5, k=2: P(X=2) = C(13,2)×C(39,3)/C(52,5) = 78×9139/2598960 ≈ 27.44%

📖 How to Use This Calculator

Steps

Choose a mode. Select "Calculate Probability" for a specific k value, or "Distribution Table" to see all valid probabilities at once for given N, K, and n.

Enter population parameters N and K. Set N to the total population size and K to the number of success items in the population. For a standard deck of cards with hearts as success, N = 52 and K = 13.

Enter the sample size n and target k. Set n to the number of items drawn and k to the number of successes you want the probability for. The calculator validates that k is in the allowable range.

Read the results. The calculator shows P(X = k), the cumulative P(X at most k), upper tail P(X at least k), mean nK/N, variance, and standard deviation.

💡 Example Calculations

Example 1: Poker Hand (N=52, K=4 Aces, n=5, k=2)

What is the probability of being dealt exactly 2 aces in a 5-card hand from a standard 52-card deck?

N = 52 (cards), K = 4 (aces), n = 5 (cards dealt), k = 2 (aces wanted).

P(X = 2) = C(4,2) times C(48,3) divided by C(52,5) = 6 times 17296 divided by 2598960.

P(X = 2) = 103776 / 2598960 = 0.03993, or about 3.99%. The mean number of aces in a 5-card hand is 5 times 4/52 = 0.385.

P(X = 2) = 3.99% | Mean = 0.3846 aces

Try this example →

Example 2: Quality Control (N=100, K=10 defective, n=15, k=0)

A batch of 100 units contains 10 defective items. An inspector samples 15 units. What is the probability of finding no defects?

N = 100, K = 10 (defective), n = 15 (sampled), k = 0 (target defects found).

P(X = 0) = C(10,0) times C(90,15) divided by C(100,15) = 1 times C(90,15) / C(100,15).

P(X = 0) = approximately 18.08%. The mean number of defects in the sample is 15 times 10/100 = 1.5.

P(X = 0) = ~18.08% | Mean = 1.5 defects

Try this example →

Example 3: Voter Survey (N=200, K=80 supporters, n=20, k=10)

A town of 200 voters has 80 who support a ballot measure. If 20 voters are randomly surveyed without replacement, what is the probability exactly 10 support the measure?

N = 200, K = 80 (supporters), n = 20 (surveyed), k = 10. The support rate is 80/200 = 40%.

P(X = 10) = C(80,10) times C(120,10) divided by C(200,20). This uses the log-binomial approach for numerical accuracy.

P(X = 10) is approximately 11.84%. The mean is 20 times 80/200 = 8 supporters. Finding exactly 10 is 2 above the mean.

P(X = 10) = ~11.84% | Mean = 8 supporters

Try this example →

❓ Frequently Asked Questions

What is the hypergeometric distribution and when is it used?+

The hypergeometric distribution gives the probability of k successes in a sample of n items drawn without replacement from a population of N items containing K successes. It is used whenever selection is done without replacement from a known finite population: card games, quality control audits, clinical trial enrollment, lottery draws, and wildlife capture-recapture studies all use the hypergeometric model.

What is the hypergeometric distribution PMF formula?+

P(X = k) = C(K, k) times C(N-K, n-k) divided by C(N, n). C(K, k) counts ways to choose k successes from K available. C(N-K, n-k) counts ways to choose the remaining n-k items from the N-K failures. C(N, n) is the total number of ways to choose n items from N, which is the denominator of the probability.

How is the hypergeometric distribution different from the binomial?+

The binomial assumes independent trials with constant probability p (sampling with replacement or infinite population). The hypergeometric models dependent trials without replacement from a finite population, where each draw changes the remaining composition. The hypergeometric variance includes a finite population correction factor (N-n)/(N-1) that is always less than 1, making it smaller than the corresponding binomial variance.

What is the mean and variance of the hypergeometric distribution?+

The mean is mu = n times K/N. The variance is sigma^2 = n times (K/N) times ((N-K)/N) times (N-n)/(N-1). For N = 100, K = 30, n = 10: mean = 3, variance = 3 times 0.7 times 0.909 = 1.909, std dev = 1.382. Compare to binomial: mean = 3, variance = 2.1 (larger because it ignores finite population).

What is the valid range of k in the hypergeometric distribution?+

k ranges from max(0, n+K-N) to min(K, n). The lower bound max(0, n+K-N) ensures you cannot have fewer successes than required (if you must draw from a pool where non-successes are limited). The upper bound min(K, n) ensures you cannot observe more successes than are available or more than the sample size allows.

When does the hypergeometric distribution approximate the binomial?+

When the population is large relative to the sample (n/N less than 5%), the hypergeometric distribution closely approximates the binomial distribution with p = K/N. The finite population correction factor (N-n)/(N-1) approaches 1, so the variances agree. Many textbooks use the 10% rule: if n/N is less than 10%, the binomial approximation is acceptable.

How do I calculate hypergeometric probability for large N?+

For large N, direct computation of C(N, n) overflows standard floating-point. This calculator uses log-space arithmetic: log P(X = k) = log C(K, k) + log C(N-K, n-k) - log C(N, n), where each log-binomial is computed via log-factorials. The result is then exponentiated. This approach handles N up to 10,000 accurately.

What is the cumulative hypergeometric probability?+

P(X at most k) is the sum of P(X = i) for all valid i from the lower bound up to k. This CDF gives the probability of observing k or fewer successes. For example, in a sample of 5 cards from a deck (N=52, K=13 hearts), P(X at most 1 heart) = P(X=0) + P(X=1) is the probability of seeing at most 1 heart.

What is the Fisher exact test and how does it relate to the hypergeometric distribution?+

Fisher's exact test uses the hypergeometric distribution to test whether two groups have the same proportion of successes. Given a 2x2 contingency table with fixed row and column totals, the hypergeometric PMF gives the probability of the observed cell counts. The p-value is the sum of hypergeometric probabilities for all tables as or more extreme than the observed one.

Can the hypergeometric distribution model the capture-recapture method?+

Yes. In ecology, researchers capture M animals, tag them, and release them. Later they capture n animals and count k tagged ones. Since K = M tagged animals are among N total, the number of tagged animals in the second sample follows a hypergeometric distribution. This lets researchers estimate N (total population size) from the observed recapture rate k/n.

What is the mode of the hypergeometric distribution?+

The mode is the integer floor of (n+1)(K+1)/(N+2), which equals either floor((n+1)(K+1)/(N+2)) or floor((n+1)(K+1)/(N+2)) minus 1. For the card example (N=52, K=13, n=5), the mode is floor(6 times 14/54) = floor(1.556) = 1, meaning exactly 1 heart is the most likely outcome in a 5-card hand.

🔗 Related Calculators

What is the hypergeometric distribution formula?

P(X = k) = C(K, k) times C(N-K, n-k) divided by C(N, n), where N is the population size, K is the number of success states in the population, n is the sample size (drawn without replacement), and k is the number of successes in the sample.

What is the mean of the hypergeometric distribution?

The mean is mu = n times K divided by N. For a sample of 10 from a population of 50 that contains 20 successes, the mean is 10 times 20 / 50 = 4. This is the same as the binomial mean with p = K/N.

What is the variance of the hypergeometric distribution?

The variance is sigma^2 = n times (K/N) times ((N-K)/N) times ((N-n)/(N-1)). The last factor (N-n)/(N-1) is the finite population correction. It is always less than 1, making the hypergeometric variance smaller than the corresponding binomial variance.

What is the difference between the hypergeometric and binomial distribution?

The binomial models draws with replacement (each draw is independent, p is constant). The hypergeometric models draws without replacement (each draw changes the remaining population). As population size N grows large relative to n, the hypergeometric approaches the binomial with p = K/N.

What are the valid values of k in the hypergeometric distribution?

k ranges from max(0, n+K-N) to min(K, n). The lower bound ensures you cannot have more failures than the population has failure states. The upper bound ensures you cannot have more successes than the smaller of K and n.

When should I use the hypergeometric distribution?

Use it whenever you sample without replacement from a finite population that has two types: successes (K items) and failures (N-K items). Examples: drawing lottery tickets, quality control inspections from a batch, dealing cards, selecting jury members, and clinical trials with a fixed patient pool.

How does the hypergeometric distribution relate to sampling theory?

The hypergeometric distribution is the exact distribution for estimating the proportion of defective items in a batch when sampling a fixed number without replacement. It underpins acceptance sampling plans in quality control, where a lot is accepted or rejected based on the count of defectives in a random sample.

What is the cumulative hypergeometric probability P(X at most k)?

P(X at most k) = sum of P(X = i) for i from max(0, n+K-N) to k. It gives the probability of observing k or fewer successes in the sample. This is the CDF of the hypergeometric distribution.

📌 Quick Tips

💡The hypergeometric distribution approaches the binomial distribution as N grows large relative to n. When N is at least 20 times n, the binomial with p = K/N is a good approximation.

💡The valid range of k is from max(0, n+K-N) to min(K, n). Values outside this range have zero probability and cannot occur.

💡The mean nK/N is the expected number of successes. If you draw 10 cards from a standard deck (N=52, K=13 hearts), the mean number of hearts is 10*13/52 = 2.5.

💡The finite population correction factor (N-n)/(N-1) in the variance formula is what distinguishes the hypergeometric from the binomial. When n/N is small, this factor is close to 1 and the two distributions nearly agree.

💡Use the Distribution Table mode to see all valid probabilities at once. The table is most useful when n and K are both much smaller than N.

Hypergeometric Distribution Calculator

🃏 What is the Hypergeometric Distribution?

📐 Formula

📖 How to Use This Calculator

Steps

💡 Example Calculations

Example 1: Poker Hand (N=52, K=4 Aces, n=5, k=2)

What is the probability of being dealt exactly 2 aces in a 5-card hand from a standard 52-card deck?

Example 2: Quality Control (N=100, K=10 defective, n=15, k=0)

A batch of 100 units contains 10 defective items. An inspector samples 15 units. What is the probability of finding no defects?

Example 3: Voter Survey (N=200, K=80 supporters, n=20, k=10)

A town of 200 voters has 80 who support a ballot measure. If 20 voters are randomly surveyed without replacement, what is the probability exactly 10 support the measure?

❓ Frequently Asked Questions

🔗 Related Calculators

What is the hypergeometric distribution formula?

What is the mean of the hypergeometric distribution?

What is the variance of the hypergeometric distribution?

What is the difference between the hypergeometric and binomial distribution?

What are the valid values of k in the hypergeometric distribution?

When should I use the hypergeometric distribution?

How does the hypergeometric distribution relate to sampling theory?

What is the cumulative hypergeometric probability P(X at most k)?

📌 Quick Tips

How helpful did you find the calculator?

Tell us more

Thank you!