What does the sampling distribution of the sample proportion tell you?+
It tells you the probability distribution of all possible values of p-hat (sample proportion) if you repeated sampling many times. For a given p and n, it shows how likely you are to observe each possible p-hat. It is the theoretical foundation for confidence intervals and hypothesis tests about proportions. Key properties: mean of p-hat = p (unbiased), SD of p-hat = sqrt(p*(1-p)/n), shape is approximately normal when np and n(1-p) are both at least 10.
How do I find the standard error of the sample proportion?+
The standard error is SE = sqrt(p*(1-p)/n). Plug in the population proportion p (as a decimal) and sample size n. For example, with p = 0.3 and n = 100: SE = sqrt(0.3*0.7/100) = sqrt(0.0021) = 0.0458. When p is unknown (typical in practice), substitute the sample proportion p-hat for p to get an estimated SE used in confidence intervals.
When is the normal approximation for p-hat appropriate?+
The normal approximation works well when np greater than or equal to 10 AND n*(1-p) greater than or equal to 10. Both conditions must hold simultaneously. For p = 0.10 and n = 100: np = 10, n*(1-p) = 90, so the approximation is acceptable (barely). For p = 0.02 and n = 100: np = 2, which violates the condition. Use a binomial exact calculation or a different method for small n or extreme p.
How is the sampling distribution of p-hat used in hypothesis testing?+
To test H0: p = p0 against Ha: p not equal to p0, compute z = (p-hat - p0) / sqrt(p0*(1-p0)/n). Under H0, z follows a standard normal distribution. The p-value is 2*(1 - normCDF(|z|)) for a two-tailed test. If the p-value is below your significance level (typically 0.05), reject H0. This is the one-proportion z-test, widely used in quality control and survey analysis.
What is the relationship between p-hat and confidence intervals?+
A 95% confidence interval for p uses the sampling distribution: p-hat plus or minus 1.96 times SE, where SE = sqrt(p-hat*(1-p-hat)/n). This interval, constructed from many samples, would contain the true p 95% of the time. The width of the interval is 2*1.96*SE = 2*1.96*sqrt(p-hat*(1-p-hat)/n), which shrinks as n grows. To halve the width, you must quadruple the sample size.
How do I find the probability that p-hat is within 3 percentage points of p?+
Use the Between Values mode with lower = p minus 0.03 and upper = p plus 0.03. The probability is P(p minus 0.03 at most p-hat at most p plus 0.03) = normCDF(0.03/SE) minus normCDF(-0.03/SE) = 2*normCDF(0.03/SE) minus 1. For p = 0.5 and n = 400: SE = 0.025, and P(|p-hat minus 0.5| at most 0.03) = 2*normCDF(1.2) minus 1 = 2*0.8849 minus 1 = 0.7698 = 76.98%.
Why is p*(1-p) maximized at p = 0.5?+
The function p*(1-p) is a downward-opening parabola. Taking the derivative and setting it to zero: d/dp [p - p^2] = 1 - 2p = 0, so p = 0.5. The maximum value is 0.5*0.5 = 0.25. This explains why surveys on 50-50 questions require the largest sample sizes to achieve a given precision. Surveys on rare or near-certain events (p near 0 or 1) have smaller SE for the same n.
How large a sample do I need for the p-hat distribution to be approximately normal?+
The required n depends on p. You need both np at least 10 and n*(1-p) at least 10. The binding constraint is the smaller of the two: n at least 10/min(p, 1-p). For p = 0.5: n at least 10/0.5 = 20. For p = 0.1: n at least 10/0.1 = 100. For p = 0.01: n at least 10/0.01 = 1000. More extreme proportions require larger samples for the normal approximation to hold.
What is the z-score for a sample proportion?+
The z-score measures how many standard errors the observed (or query) p-hat is above or below the population proportion p. Formula: z = (p-hat - p) / SE, where SE = sqrt(p*(1-p)/n). A z-score of 1.96 corresponds to the 97.5th percentile of the standard normal, meaning about 2.5% of samples produce a p-hat that far above p. Z-scores beyond plus or minus 1.96 are in the outer 5% of the distribution (total, both tails).
How does increasing sample size affect the sampling distribution?+
Increasing n decreases SE proportionally to 1/sqrt(n), so the distribution becomes narrower and more concentrated around p. Doubling n reduces SE by a factor of sqrt(2) = 1.41. Quadrupling n halves SE. This means that to improve precision by a factor of 2 (halve the margin of error), you need 4 times as many observations. The shape also becomes more symmetric and closer to normal as n grows, even for skewed proportions near 0 or 1.
Can I use this for the difference between two sample proportions?+
No, this calculator handles a single proportion. For the difference p-hat1 minus p-hat2 (e.g., comparing two groups), the SE of the difference is sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2) under the assumption of independent samples. The two-proportion z-test uses this combined SE. A separate two-proportion test calculator is more appropriate for that comparison.
What is the difference between standard deviation and standard error in this context?+
The population standard deviation sigma describes the spread of individual Bernoulli outcomes (0 or 1) and equals sqrt(p*(1-p)). The standard error SE = sigma/sqrt(n) = sqrt(p*(1-p)/n) describes the spread of p-hat values across repeated samples. SE is always smaller than sigma by a factor of 1/sqrt(n). As n grows, SE shrinks toward zero while sigma stays fixed at sqrt(p*(1-p)).