What is population variance and what is its formula?+
Population variance sigma^2 = Sum((xi - mu)^2) / N, where xi are the data values, mu is the population mean, and N is the total count. Step 1: compute mu = Sum(xi)/N. Step 2: subtract mu from each xi. Step 3: square each deviation. Step 4: sum the squared deviations (SS). Step 5: divide SS by N. Population variance is in squared units of the original data; take the square root to get the population standard deviation sigma.
When should I use population variance vs sample variance?+
Use population variance (divide by N) when your data covers every member of the group you are studying, with no sampling. Examples: all grades in one class, weights of all items in a closed batch, temperatures recorded every hour in a fixed 24-hour period. Use sample variance (divide by n-1) when your data is a subset of a larger population you are trying to generalise to, which is the typical case in surveys, experiments, and observational studies.
What does a large population variance mean?+
A large sigma^2 means the values in the population are widely spread around the mean. For example, a population of exam scores with sigma^2 = 400 (sigma = 20 points) has much more variability than one with sigma^2 = 25 (sigma = 5 points). Population variance of zero means every value is identical to the population mean. Because sigma^2 is in squared units, the population standard deviation sigma is easier to interpret and compare to the mean directly.
Can population variance be greater than 1?+
Yes, population variance can be any non-negative real number with no upper bound. It depends entirely on the scale of the data and the spread of values. For data measured in kilograms, variance might be 0.04 kg^2 (tightly packed data) or 10,000 kg^2 (very dispersed data). Variance can only be zero (when all values are identical) or positive. The only constraint is that it cannot be negative.
What is the coefficient of variation and why is it useful?+
The coefficient of variation (CV) = (sigma / mu) x 100% expresses spread relative to the mean. Unlike variance or standard deviation, CV is dimensionless, making it possible to compare variability across datasets measured in different units. Example: two factories both have sigma = 0.5 mm. Factory A produces 10 mm bolts (CV = 5%), Factory B produces 100 mm bolts (CV = 0.5%). Factory B is relatively more consistent despite the same absolute SD. CV is undefined if the mean is zero.
How is grouped data population variance calculated?+
For grouped frequency data, use sigma^2 = Sum(fi * (xi - mu)^2) / N, where xi are class midpoints, fi are class frequencies, and N = Sum(fi). First compute the weighted mean mu = Sum(fi * xi) / N. Then for each class, compute fi * (xi - mu)^2 and sum these weighted squared deviations. Divide by N for population variance. This is an approximation because all values in each class are assumed equal to the class midpoint.
What is the relationship between population variance and standard deviation?+
Population standard deviation sigma = sqrt(sigma^2). Squaring sigma gives sigma^2 (population variance). Standard deviation is expressed in the same unit as the data (e.g., cm, kg, dollars), making it easier to interpret and compare to the mean. Variance is in squared units. In practice, standard deviation is reported more often in articles and reports, while variance is used in mathematical derivations and ANOVA because variances of independent populations add together.
Why is population variance computed with N and not N-1?+
When computing the variance of an entire population, dividing by N gives the exact average squared deviation for that population. No estimation is needed because every value is known. Dividing by N-1 (Bessel's correction) is only necessary when working with a sample to correct for the fact that sample means underestimate the true spread. Since population variance makes no inference beyond the data at hand, the N denominator is mathematically exact and appropriate.
How is population variance used in the normal distribution?+
A normal (Gaussian) distribution is fully described by two parameters: the population mean mu and the population variance sigma^2. Written N(mu, sigma^2), it gives the probability of any value in the population. About 68.27% of values fall within 1 sigma of mu, 95.45% within 2 sigma, and 99.73% within 3 sigma. Population variance computed from census data or a complete historical series can be used directly in these formulas without any sampling correction.
Does adding a constant to all values change population variance?+
No. Adding a constant c to every value shifts the mean by c but does not change any deviation (xi + c) - (mu + c) = xi - mu. Since all deviations stay the same, the sum of squared deviations and therefore sigma^2 are unchanged. However, multiplying all values by a constant k multiplies sigma^2 by k^2, because each deviation is multiplied by k and each squared deviation by k^2. These properties hold for both population and sample variance.
What is the computational formula for population variance?+
The computational formula is sigma^2 = (Sum(xi^2) / N) - mu^2 = E[X^2] - (E[X])^2. This is algebraically equivalent to the definitional formula Sum((xi-mu)^2)/N but avoids computing deviations from the mean separately, which can be useful for hand calculations or streaming data. Expanding (xi-mu)^2 = xi^2 - 2*xi*mu + mu^2 and summing gives Sum(xi^2) - 2*mu*Sum(xi) + N*mu^2 = Sum(xi^2) - N*mu^2. Dividing by N gives the formula above.