Sampling Distribution of the Sample Proportion Calculator

Find probabilities, z-scores, and standard error for the sampling distribution of a sample proportion p-hat given population proportion p and sample size n.

๐Ÿ“Š Sampling Distribution of the Sample Proportion
Population proportion p (%)40
%
1%99%
Sample size (n)100
101000
Query proportion p̂ (%)45
%
0%100%
Population proportion p (%)50
%
1%99%
Sample size (n)200
101000
Lower bound p̂₁ (%)45
%
0%100%
Upper bound p̂₂ (%)55
%
0%100%
P(p̂ ≤ query)
P(p̂ > query)
Z-Score
Standard Error (SE)
Mean of p̂ (= p)
Variance of p̂
P(p̂₁ ≤ p̂ ≤ p̂₂)
P(p̂ < lower)
P(p̂ > upper)
Standard Error (SE)
Mean of p̂ (= p)

๐Ÿ“Š What is the Sampling Distribution of the Sample Proportion?

The sampling distribution of the sample proportion is the probability distribution of all possible values of the sample proportion p-hat that could result from drawing random samples of size n from a population where the true proportion is p. Every time you survey a random sample and compute the fraction of respondents with a certain characteristic, you obtain one observation from this sampling distribution.

Three key properties define the distribution: (1) the mean of p-hat equals the population proportion p, meaning the sample proportion is an unbiased estimator; (2) the standard error (standard deviation of p-hat) is SE = sqrt(p*(1-p)/n), which shrinks as n grows; (3) by the Central Limit Theorem, the distribution is approximately normal when both np and n*(1-p) are at least 10. These properties underpin confidence intervals, hypothesis tests about proportions, and survey margin-of-error calculations that appear in political polling, clinical trials, quality control audits, and A/B testing.

A common misconception is that the sampling distribution describes individual observations. It does not. If you survey 100 voters, the sampling distribution tells you how the fraction p-hat (e.g., 0.52 for 52 respondents out of 100 preferring candidate A) would vary across many hypothetical repetitions of the same survey. A single survey gives one p-hat; the sampling distribution characterizes all the p-hat values you would get if you repeated the survey thousands of times.

Another important point is that the normal approximation improves with larger n but is not exact. For small samples or extreme proportions (p near 0 or 1), the binomial exact distribution is more appropriate. When np or n*(1-p) falls below 10, this calculator displays a caution note. For rigorous small-sample inference, use a binomial test or Fisher exact test instead. For typical survey research with n above 100 and p between 0.1 and 0.9, the normal approximation is highly accurate and produces results that match simulation studies to within rounding error.

๐Ÿ“ Formula

SE  =  √(p × (1 − p) / n)
SE = standard error of the sample proportion
p = true population proportion (0 < p < 1)
n = sample size
Z-score: z = (p̂ − p) / SE
P(p̂ ≤ q): normCDF(z) using standard normal CDF
Mean: E[p̂] = p
Variance: Var(p̂) = p(1 − p) / n
Validity: Normal approximation valid when np ≥ 10 and n(1−p) ≥ 10
Example: p = 0.40, n = 100, query p̂ = 0.45: SE = 0.04899, z = 1.021, P(p̂ ≤ 0.45) ≈ 84.64%

๐Ÿ“– How to Use This Calculator

Steps

1
Choose a calculation mode - Select "Find Probability" to get P(p-hat at most your query) and the survival probability, or select "Between Values" to get the probability that p-hat falls in a range.
2
Enter the population proportion p - Type the true proportion as a percentage (e.g., 40 for p = 0.40). This is the assumed or known population parameter.
3
Enter the sample size n - Type the number of observations in each sample. Larger n gives a smaller SE and a more concentrated distribution. The calculator checks whether the normal approximation is appropriate.
4
Enter the query proportion (Find Probability mode) - Type the threshold as a percentage. The calculator computes SE, z-score, and both tail probabilities instantly.
5
Read the results - The primary result is the cumulative probability. Check the validity note to confirm the normal approximation applies. Use the Between Values mode for range queries.

๐Ÿ’ก Example Calculations

Example 1 - Voter Poll (p = 40%, n = 100, query = 45%)

In an election where 40% of voters prefer candidate A, what is the probability that a poll of 100 voters shows 45% or more support?

1
SE = sqrt(0.40 * 0.60 / 100) = sqrt(0.0024) = 0.04899.
2
z = (0.45 - 0.40) / 0.04899 = 0.05 / 0.04899 = 1.021.
3
P(p-hat at most 0.45) = normCDF(1.021) = 84.64%. P(p-hat greater than 0.45) = 15.36%.
P(p̂ ≥ 45%) = 15.36% (survival probability)
Try this example →

Example 2 - Quality Control (p = 5% defect rate, n = 200, query = 8%)

A factory has a 5% defect rate. What is the probability that a sample of 200 units shows 8% or more defects?

1
SE = sqrt(0.05 * 0.95 / 200) = sqrt(0.0002375) = 0.01541.
2
z = (0.08 - 0.05) / 0.01541 = 0.03 / 0.01541 = 1.947.
3
P(p-hat at most 0.08) = normCDF(1.947) = 97.42%. Survival = 2.58%.
P(p̂ ≥ 8%) = 2.58% (rare event, flags a process problem)
Try this example →

Example 3 - Survey Range (p = 50%, n = 400, between 47% and 53%)

A 50-50 election: what fraction of polls of 400 voters will show results between 47% and 53%?

1
SE = sqrt(0.50 * 0.50 / 400) = sqrt(0.000625) = 0.025.
2
z_lower = (0.47 - 0.50) / 0.025 = -1.20. z_upper = (0.53 - 0.50) / 0.025 = 1.20.
3
P(47% at most p-hat at most 53%) = normCDF(1.20) minus normCDF(-1.20) = 0.8849 minus 0.1151 = 0.7698 = 76.98%.
P(47% ≤ p̂ ≤ 53%) = 76.98%
Try this example →

โ“ Frequently Asked Questions

What does the sampling distribution of the sample proportion tell you?+
It tells you the probability distribution of all possible values of p-hat (sample proportion) if you repeated sampling many times. For a given p and n, it shows how likely you are to observe each possible p-hat. It is the theoretical foundation for confidence intervals and hypothesis tests about proportions. Key properties: mean of p-hat = p (unbiased), SD of p-hat = sqrt(p*(1-p)/n), shape is approximately normal when np and n(1-p) are both at least 10.
How do I find the standard error of the sample proportion?+
The standard error is SE = sqrt(p*(1-p)/n). Plug in the population proportion p (as a decimal) and sample size n. For example, with p = 0.3 and n = 100: SE = sqrt(0.3*0.7/100) = sqrt(0.0021) = 0.0458. When p is unknown (typical in practice), substitute the sample proportion p-hat for p to get an estimated SE used in confidence intervals.
When is the normal approximation for p-hat appropriate?+
The normal approximation works well when np greater than or equal to 10 AND n*(1-p) greater than or equal to 10. Both conditions must hold simultaneously. For p = 0.10 and n = 100: np = 10, n*(1-p) = 90, so the approximation is acceptable (barely). For p = 0.02 and n = 100: np = 2, which violates the condition. Use a binomial exact calculation or a different method for small n or extreme p.
How is the sampling distribution of p-hat used in hypothesis testing?+
To test H0: p = p0 against Ha: p not equal to p0, compute z = (p-hat - p0) / sqrt(p0*(1-p0)/n). Under H0, z follows a standard normal distribution. The p-value is 2*(1 - normCDF(|z|)) for a two-tailed test. If the p-value is below your significance level (typically 0.05), reject H0. This is the one-proportion z-test, widely used in quality control and survey analysis.
What is the relationship between p-hat and confidence intervals?+
A 95% confidence interval for p uses the sampling distribution: p-hat plus or minus 1.96 times SE, where SE = sqrt(p-hat*(1-p-hat)/n). This interval, constructed from many samples, would contain the true p 95% of the time. The width of the interval is 2*1.96*SE = 2*1.96*sqrt(p-hat*(1-p-hat)/n), which shrinks as n grows. To halve the width, you must quadruple the sample size.
How do I find the probability that p-hat is within 3 percentage points of p?+
Use the Between Values mode with lower = p minus 0.03 and upper = p plus 0.03. The probability is P(p minus 0.03 at most p-hat at most p plus 0.03) = normCDF(0.03/SE) minus normCDF(-0.03/SE) = 2*normCDF(0.03/SE) minus 1. For p = 0.5 and n = 400: SE = 0.025, and P(|p-hat minus 0.5| at most 0.03) = 2*normCDF(1.2) minus 1 = 2*0.8849 minus 1 = 0.7698 = 76.98%.
Why is p*(1-p) maximized at p = 0.5?+
The function p*(1-p) is a downward-opening parabola. Taking the derivative and setting it to zero: d/dp [p - p^2] = 1 - 2p = 0, so p = 0.5. The maximum value is 0.5*0.5 = 0.25. This explains why surveys on 50-50 questions require the largest sample sizes to achieve a given precision. Surveys on rare or near-certain events (p near 0 or 1) have smaller SE for the same n.
How large a sample do I need for the p-hat distribution to be approximately normal?+
The required n depends on p. You need both np at least 10 and n*(1-p) at least 10. The binding constraint is the smaller of the two: n at least 10/min(p, 1-p). For p = 0.5: n at least 10/0.5 = 20. For p = 0.1: n at least 10/0.1 = 100. For p = 0.01: n at least 10/0.01 = 1000. More extreme proportions require larger samples for the normal approximation to hold.
What is the z-score for a sample proportion?+
The z-score measures how many standard errors the observed (or query) p-hat is above or below the population proportion p. Formula: z = (p-hat - p) / SE, where SE = sqrt(p*(1-p)/n). A z-score of 1.96 corresponds to the 97.5th percentile of the standard normal, meaning about 2.5% of samples produce a p-hat that far above p. Z-scores beyond plus or minus 1.96 are in the outer 5% of the distribution (total, both tails).
How does increasing sample size affect the sampling distribution?+
Increasing n decreases SE proportionally to 1/sqrt(n), so the distribution becomes narrower and more concentrated around p. Doubling n reduces SE by a factor of sqrt(2) = 1.41. Quadrupling n halves SE. This means that to improve precision by a factor of 2 (halve the margin of error), you need 4 times as many observations. The shape also becomes more symmetric and closer to normal as n grows, even for skewed proportions near 0 or 1.
Can I use this for the difference between two sample proportions?+
No, this calculator handles a single proportion. For the difference p-hat1 minus p-hat2 (e.g., comparing two groups), the SE of the difference is sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2) under the assumption of independent samples. The two-proportion z-test uses this combined SE. A separate two-proportion test calculator is more appropriate for that comparison.
What is the difference between standard deviation and standard error in this context?+
The population standard deviation sigma describes the spread of individual Bernoulli outcomes (0 or 1) and equals sqrt(p*(1-p)). The standard error SE = sigma/sqrt(n) = sqrt(p*(1-p)/n) describes the spread of p-hat values across repeated samples. SE is always smaller than sigma by a factor of 1/sqrt(n). As n grows, SE shrinks toward zero while sigma stays fixed at sqrt(p*(1-p)).