Sampling Distribution of the Sample Proportion Calculator

Q: What is the sampling distribution of the sample proportion?

When you draw a random sample of n observations from a population where the true proportion is p, the sample proportion p-hat = x/n varies from sample to sample. The collection of all possible p-hat values and their probabilities forms the sampling distribution. Its mean is p, its standard deviation is sqrt(p*(1-p)/n), and by the Central Limit Theorem it is approximately normal for large n.

Q: What is the formula for the standard error of a proportion?

The standard error is SE = sqrt(p*(1-p)/n), where p is the population proportion and n is the sample size. For p=0.5 (maximum uncertainty) and n=100, SE = sqrt(0.25/100) = 0.05. Larger n always decreases SE; the proportion p(1-p) is maximized at p=0.5 and shrinks toward zero as p approaches 0 or 1.

Q: When can I use the normal approximation for the sample proportion?

The normal approximation works well when both np and n(1-p) are at least 10. For example, with p=0.2 and n=50, np=10 and n(1-p)=40, so the approximation is just barely acceptable. With p=0.05 and n=50, np=2.5, which is too small and the distribution is skewed; use a binomial exact test instead.

Q: How do I calculate P(p-hat at most 0.55) when p=0.5 and n=100?

Step 1: SE = sqrt(0.5*0.5/100) = 0.05. Step 2: z = (0.55 - 0.5)/0.05 = 1.00. Step 3: P(p-hat at most 0.55) = normCDF(1.00) = 0.8413, or about 84.13%. This means about 84% of random samples of size 100 from a 50% population will have a sample proportion of 55% or less.

Q: What is the mean and variance of the sample proportion distribution?

The mean (expected value) is E[p-hat] = p. The variance is Var(p-hat) = p*(1-p)/n. The standard deviation (standard error) is SE = sqrt(p*(1-p)/n). For example, with p=0.3 and n=200, the variance is 0.3*0.7/200 = 0.00105 and SE = 0.0324.

Q: How does sample size affect the sampling distribution of p-hat?

Larger sample size concentrates the sampling distribution more tightly around p. Doubling n multiplies the variance by 1/2, which divides SE by sqrt(2). Quadrupling n halves the SE. This is why larger surveys give narrower confidence intervals and more reliable proportion estimates.

Q: What is the difference between p and p-hat in statistics?

The parameter p is the fixed, unknown true proportion in the population (e.g., the fraction of all voters who prefer a candidate). The statistic p-hat is the observed proportion in a particular sample. Because sampling is random, p-hat varies from sample to sample; p does not. The goal of inference is to estimate p using information about the distribution of p-hat.

Q: How is the Between Values mode useful?

The Between Values mode computes P(p1 at most p-hat at most p2) using P(p1 at most p-hat at most p2) = normCDF(z2) minus normCDF(z1), where z1 = (p1-p)/SE and z2 = (p2-p)/SE. This answers questions like: if the true approval rating is 40%, what is the probability that a poll of 500 people shows between 37% and 43% approval? Answer: P(0.37 at most p-hat at most 0.43) = normCDF(-0.67) subtracted from normCDF(0.67) = 0.4972.

Find probabilities, z-scores, and standard error for the sampling distribution of a sample proportion p-hat given population proportion p and sample size n.

Population proportion p (%)40

1%99%

Sample size (n)100

101000

Query proportion p̂ (%)45

0%100%

Population proportion p (%)50

1%99%

Sample size (n)200

101000

Lower bound p̂₁ (%)45

0%100%

Upper bound p̂₂ (%)55

0%100%

P(p̂ ≤ query)

—

P(p̂ > query)

—

Z-Score

—

Standard Error (SE)

—

Mean of p̂ (= p)

—

Variance of p̂

—

P(p̂₁ ≤ p̂ ≤ p̂₂)

—

P(p̂ < lower)

—

P(p̂ > upper)

—

Standard Error (SE)

—

Mean of p̂ (= p)

—

📊 What is the Sampling Distribution of the Sample Proportion?

The sampling distribution of the sample proportion is the probability distribution of all possible values of the sample proportion p-hat that could result from drawing random samples of size n from a population where the true proportion is p. Every time you survey a random sample and compute the fraction of respondents with a certain characteristic, you obtain one observation from this sampling distribution.

Three key properties define the distribution: (1) the mean of p-hat equals the population proportion p, meaning the sample proportion is an unbiased estimator; (2) the standard error (standard deviation of p-hat) is SE = sqrt(p*(1-p)/n), which shrinks as n grows; (3) by the Central Limit Theorem, the distribution is approximately normal when both np and n*(1-p) are at least 10. These properties underpin confidence intervals, hypothesis tests about proportions, and survey margin-of-error calculations that appear in political polling, clinical trials, quality control audits, and A/B testing.

A common misconception is that the sampling distribution describes individual observations. It does not. If you survey 100 voters, the sampling distribution tells you how the fraction p-hat (e.g., 0.52 for 52 respondents out of 100 preferring candidate A) would vary across many hypothetical repetitions of the same survey. A single survey gives one p-hat; the sampling distribution characterizes all the p-hat values you would get if you repeated the survey thousands of times.

Another important point is that the normal approximation improves with larger n but is not exact. For small samples or extreme proportions (p near 0 or 1), the binomial exact distribution is more appropriate. When np or n*(1-p) falls below 10, this calculator displays a caution note. For rigorous small-sample inference, use a binomial test or Fisher exact test instead. For typical survey research with n above 100 and p between 0.1 and 0.9, the normal approximation is highly accurate and produces results that match simulation studies to within rounding error.

📐 Formula

SE = √(p × (1 − p) / n)

SE = standard error of the sample proportion

p = true population proportion (0 < p < 1)

n = sample size

Z-score: z = (p̂ − p) / SE

P(p̂ ≤ q): normCDF(z) using standard normal CDF

Mean: E[p̂] = p

Variance: Var(p̂) = p(1 − p) / n

Validity: Normal approximation valid when np ≥ 10 and n(1−p) ≥ 10

Example: p = 0.40, n = 100, query p̂ = 0.45: SE = 0.04899, z = 1.021, P(p̂ ≤ 0.45) ≈ 84.64%

📖 How to Use This Calculator

Steps

Choose a calculation mode - Select "Find Probability" to get P(p-hat at most your query) and the survival probability, or select "Between Values" to get the probability that p-hat falls in a range.

Enter the population proportion p - Type the true proportion as a percentage (e.g., 40 for p = 0.40). This is the assumed or known population parameter.

Enter the sample size n - Type the number of observations in each sample. Larger n gives a smaller SE and a more concentrated distribution. The calculator checks whether the normal approximation is appropriate.

Enter the query proportion (Find Probability mode) - Type the threshold as a percentage. The calculator computes SE, z-score, and both tail probabilities instantly.

Read the results - The primary result is the cumulative probability. Check the validity note to confirm the normal approximation applies. Use the Between Values mode for range queries.

💡 Example Calculations

Example 1 - Voter Poll (p = 40%, n = 100, query = 45%)

In an election where 40% of voters prefer candidate A, what is the probability that a poll of 100 voters shows 45% or more support?

SE = sqrt(0.40 * 0.60 / 100) = sqrt(0.0024) = 0.04899.

z = (0.45 - 0.40) / 0.04899 = 0.05 / 0.04899 = 1.021.

P(p-hat at most 0.45) = normCDF(1.021) = 84.63%. P(p-hat greater than 0.45) = 15.37%.

P(p̂ ≥ 45%) = 15.37% (survival probability)

Try this example →

Example 2 - Quality Control (p = 5% defect rate, n = 200, query = 8%)

A factory has a 5% defect rate. What is the probability that a sample of 200 units shows 8% or more defects?

SE = sqrt(0.05 * 0.95 / 200) = sqrt(0.0002375) = 0.01541.

z = (0.08 - 0.05) / 0.01541 = 0.03 / 0.01541 = 1.947.

P(p-hat at most 0.08) = normCDF(1.947) = 97.42%. Survival = 2.58%.

P(p̂ ≥ 8%) = 2.58% (rare event, flags a process problem)

Try this example →

Example 3 - Survey Range (p = 50%, n = 400, between 47% and 53%)

A 50-50 election: what fraction of polls of 400 voters will show results between 47% and 53%?

SE = sqrt(0.50 * 0.50 / 400) = sqrt(0.000625) = 0.025.

z_lower = (0.47 - 0.50) / 0.025 = -1.20. z_upper = (0.53 - 0.50) / 0.025 = 1.20.

P(47% at most p-hat at most 53%) = normCDF(1.20) minus normCDF(-1.20) = 0.8850 minus 0.1150 = 0.7699 = 76.99%.

P(47% ≤ p̂ ≤ 53%) = 76.99%

Try this example →

❓ Frequently Asked Questions

What does the sampling distribution of the sample proportion tell you?+

It tells you the probability distribution of all possible values of p-hat (sample proportion) if you repeated sampling many times. For a given p and n, it shows how likely you are to observe each possible p-hat. It is the theoretical foundation for confidence intervals and hypothesis tests about proportions. Key properties: mean of p-hat = p (unbiased), SD of p-hat = sqrt(p*(1-p)/n), shape is approximately normal when np and n(1-p) are both at least 10.

How do I find the standard error of the sample proportion?+

The standard error is SE = sqrt(p*(1-p)/n). Plug in the population proportion p (as a decimal) and sample size n. For example, with p = 0.3 and n = 100: SE = sqrt(0.3*0.7/100) = sqrt(0.0021) = 0.0458. When p is unknown (typical in practice), substitute the sample proportion p-hat for p to get an estimated SE used in confidence intervals.

When is the normal approximation for p-hat appropriate?+

The normal approximation works well when np greater than or equal to 10 AND n*(1-p) greater than or equal to 10. Both conditions must hold simultaneously. For p = 0.10 and n = 100: np = 10, n*(1-p) = 90, so the approximation is acceptable (barely). For p = 0.02 and n = 100: np = 2, which violates the condition. Use a binomial exact calculation or a different method for small n or extreme p.

How is the sampling distribution of p-hat used in hypothesis testing?+

To test H0: p = p0 against Ha: p not equal to p0, compute z = (p-hat - p0) / sqrt(p0*(1-p0)/n). Under H0, z follows a standard normal distribution. The p-value is 2*(1 - normCDF(|z|)) for a two-tailed test. If the p-value is below your significance level (typically 0.05), reject H0. This is the one-proportion z-test, widely used in quality control and survey analysis.

What is the relationship between p-hat and confidence intervals?+

A 95% confidence interval for p uses the sampling distribution: p-hat plus or minus 1.96 times SE, where SE = sqrt(p-hat*(1-p-hat)/n). This interval, constructed from many samples, would contain the true p 95% of the time. The width of the interval is 2*1.96*SE = 2*1.96*sqrt(p-hat*(1-p-hat)/n), which shrinks as n grows. To halve the width, you must quadruple the sample size.

How do I find the probability that p-hat is within 3 percentage points of p?+

Use the Between Values mode with lower = p minus 0.03 and upper = p plus 0.03. The probability is P(p minus 0.03 at most p-hat at most p plus 0.03) = normCDF(0.03/SE) minus normCDF(-0.03/SE) = 2*normCDF(0.03/SE) minus 1. For p = 0.5 and n = 400: SE = 0.025, and P(|p-hat minus 0.5| at most 0.03) = 2*normCDF(1.2) minus 1 = 2*0.8849 minus 1 = 0.7698 = 76.98%.

Why is p*(1-p) maximized at p = 0.5?+

The function p*(1-p) is a downward-opening parabola. Taking the derivative and setting it to zero: d/dp [p - p^2] = 1 - 2p = 0, so p = 0.5. The maximum value is 0.5*0.5 = 0.25. This explains why surveys on 50-50 questions require the largest sample sizes to achieve a given precision. Surveys on rare or near-certain events (p near 0 or 1) have smaller SE for the same n.

How large a sample do I need for the p-hat distribution to be approximately normal?+

The required n depends on p. You need both np at least 10 and n*(1-p) at least 10. The binding constraint is the smaller of the two: n at least 10/min(p, 1-p). For p = 0.5: n at least 10/0.5 = 20. For p = 0.1: n at least 10/0.1 = 100. For p = 0.01: n at least 10/0.01 = 1000. More extreme proportions require larger samples for the normal approximation to hold.

What is the z-score for a sample proportion?+

The z-score measures how many standard errors the observed (or query) p-hat is above or below the population proportion p. Formula: z = (p-hat - p) / SE, where SE = sqrt(p*(1-p)/n). A z-score of 1.96 corresponds to the 97.5th percentile of the standard normal, meaning about 2.5% of samples produce a p-hat that far above p. Z-scores beyond plus or minus 1.96 are in the outer 5% of the distribution (total, both tails).

How does increasing sample size affect the sampling distribution?+

Increasing n decreases SE proportionally to 1/sqrt(n), so the distribution becomes narrower and more concentrated around p. Doubling n reduces SE by a factor of sqrt(2) = 1.41. Quadrupling n halves SE. This means that to improve precision by a factor of 2 (halve the margin of error), you need 4 times as many observations. The shape also becomes more symmetric and closer to normal as n grows, even for skewed proportions near 0 or 1.

Can I use this for the difference between two sample proportions?+

No, this calculator handles a single proportion. For the difference p-hat1 minus p-hat2 (e.g., comparing two groups), the SE of the difference is sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2) under the assumption of independent samples. The two-proportion z-test uses this combined SE. A separate two-proportion test calculator is more appropriate for that comparison.

What is the difference between standard deviation and standard error in this context?+

The population standard deviation sigma describes the spread of individual Bernoulli outcomes (0 or 1) and equals sqrt(p*(1-p)). The standard error SE = sigma/sqrt(n) = sqrt(p*(1-p)/n) describes the spread of p-hat values across repeated samples. SE is always smaller than sigma by a factor of 1/sqrt(n). As n grows, SE shrinks toward zero while sigma stays fixed at sqrt(p*(1-p)).

🔗 Related Calculators

What is the sampling distribution of the sample proportion?

When you draw a random sample of n observations from a population where the true proportion is p, the sample proportion p-hat = x/n varies from sample to sample. The collection of all possible p-hat values and their probabilities forms the sampling distribution. Its mean is p, its standard deviation is sqrt(p*(1-p)/n), and by the Central Limit Theorem it is approximately normal for large n.

What is the formula for the standard error of a proportion?

The standard error is SE = sqrt(p*(1-p)/n), where p is the population proportion and n is the sample size. For p=0.5 (maximum uncertainty) and n=100, SE = sqrt(0.25/100) = 0.05. Larger n always decreases SE; the proportion p(1-p) is maximized at p=0.5 and shrinks toward zero as p approaches 0 or 1.

When can I use the normal approximation for the sample proportion?

The normal approximation works well when both np and n(1-p) are at least 10. For example, with p=0.2 and n=50, np=10 and n(1-p)=40, so the approximation is just barely acceptable. With p=0.05 and n=50, np=2.5, which is too small and the distribution is skewed; use a binomial exact test instead.

How do I calculate P(p-hat at most 0.55) when p=0.5 and n=100?

Step 1: SE = sqrt(0.5*0.5/100) = 0.05. Step 2: z = (0.55 - 0.5)/0.05 = 1.00. Step 3: P(p-hat at most 0.55) = normCDF(1.00) = 0.8413, or about 84.13%. This means about 84% of random samples of size 100 from a 50% population will have a sample proportion of 55% or less.

What is the mean and variance of the sample proportion distribution?

The mean (expected value) is E[p-hat] = p. The variance is Var(p-hat) = p*(1-p)/n. The standard deviation (standard error) is SE = sqrt(p*(1-p)/n). For example, with p=0.3 and n=200, the variance is 0.3*0.7/200 = 0.00105 and SE = 0.0324.

How does sample size affect the sampling distribution of p-hat?

Larger sample size concentrates the sampling distribution more tightly around p. Doubling n multiplies the variance by 1/2, which divides SE by sqrt(2). Quadrupling n halves the SE. This is why larger surveys give narrower confidence intervals and more reliable proportion estimates.

What is the difference between p and p-hat in statistics?

The parameter p is the fixed, unknown true proportion in the population (e.g., the fraction of all voters who prefer a candidate). The statistic p-hat is the observed proportion in a particular sample. Because sampling is random, p-hat varies from sample to sample; p does not. The goal of inference is to estimate p using information about the distribution of p-hat.

How is the Between Values mode useful?

The Between Values mode computes P(p1 at most p-hat at most p2) using P(p1 at most p-hat at most p2) = normCDF(z2) minus normCDF(z1), where z1 = (p1-p)/SE and z2 = (p2-p)/SE. This answers questions like: if the true approval rating is 40%, what is the probability that a poll of 500 people shows between 37% and 43% approval? Answer: P(0.37 at most p-hat at most 0.43) = normCDF(-0.67) subtracted from normCDF(0.67) = 0.4972.

📌 Quick Tips

💡The normal approximation is valid only when both np and n(1-p) are at least 10. The calculator displays a validity note to warn you when this condition is not met.

💡The standard error SE = sqrt(p*(1-p)/n) shrinks as sample size grows. Quadrupling n halves the SE, so larger samples give more precise proportion estimates.

💡The z-score measures how many standard errors the query proportion is above or below the true population proportion. A z-score beyond plus or minus 1.96 corresponds to an event in the outer 5% of the distribution.

💡Use the Between Values mode to find the probability that a sample proportion falls within a specific range, for example P(0.45 at most p-hat at most 0.55) when the true proportion is 0.5.

💡For survey margin-of-error planning: a 95% confidence interval for p-hat has half-width 1.96 times SE. Set n so that 1.96 times sqrt(p*(1-p)/n) is at most your desired margin.

Sampling Distribution of the Sample Proportion Calculator

📊 What is the Sampling Distribution of the Sample Proportion?

📐 Formula

📖 How to Use This Calculator

Steps

💡 Example Calculations

Example 1 - Voter Poll (p = 40%, n = 100, query = 45%)

In an election where 40% of voters prefer candidate A, what is the probability that a poll of 100 voters shows 45% or more support?

Example 2 - Quality Control (p = 5% defect rate, n = 200, query = 8%)

A factory has a 5% defect rate. What is the probability that a sample of 200 units shows 8% or more defects?

Example 3 - Survey Range (p = 50%, n = 400, between 47% and 53%)

A 50-50 election: what fraction of polls of 400 voters will show results between 47% and 53%?

❓ Frequently Asked Questions

🔗 Related Calculators

What is the sampling distribution of the sample proportion?

What is the formula for the standard error of a proportion?

When can I use the normal approximation for the sample proportion?

How do I calculate P(p-hat at most 0.55) when p=0.5 and n=100?

What is the mean and variance of the sample proportion distribution?

How does sample size affect the sampling distribution of p-hat?

What is the difference between p and p-hat in statistics?

How is the Between Values mode useful?

📌 Quick Tips

How helpful did you find the calculator?

Tell us more

Thank you!