Covariance Calculator

Find sample and population covariance for any two-variable dataset, with a step-by-step deviation table.

๐Ÿ“Š Covariance Calculator
Sample Covariance
Population Covariance
Mean of X
Mean of Y
Count (n)
Sum of Products
Direction

๐Ÿ“Š What is Covariance?

Covariance is a statistical measure that quantifies how two random variables change together. When covariance is positive, the two variables tend to move in the same direction: higher-than-average values of X are paired with higher-than-average values of Y. When covariance is negative, they move in opposite directions: above-average X values tend to coincide with below-average Y values. A covariance of zero indicates no linear relationship between the variables.

Covariance shows up in a wide range of practical applications. In finance, portfolio managers use the covariance between asset returns to quantify diversification benefits: two stocks with negative covariance tend to offset each other's losses. In machine learning, principal component analysis (PCA) relies on the covariance matrix of features to find the directions of maximum variance. In biology and medicine, covariance between physiological measurements (such as body weight and blood pressure) reveals how health markers are interconnected.

A common point of confusion is the difference between covariance and correlation. Covariance retains the units of the original variables (if X is in centimetres and Y is in kilograms, covariance is in cm x kg), which makes its raw value hard to interpret or compare across different datasets. The Pearson correlation coefficient r is simply the covariance divided by the product of the two standard deviations, which scales the result to a unitless number between -1 and +1. Use covariance when you need the raw joint variability for further calculations (such as portfolio variance), and use correlation when you want an interpretable measure of linear association strength.

This calculator handles both the most common use case (raw paired data) and the textbook case (pre-computed summary statistics). It returns both sample covariance (using the n-1 denominator for unbiased estimation) and population covariance (using n), so you can choose whichever is appropriate for your analysis. The deviation product table in Raw Data mode shows every step of the calculation, making this tool useful for checking homework as well as verifying research results.

๐Ÿ“ Formula

Sample Cov(X,Y)  =  Σ(xᵢ − x̄)(yᵢ − ȳ) ÷ (n − 1)
xᵢ = each value in the X dataset
yᵢ = corresponding value in the Y dataset
= mean of X  |  = mean of Y
n = number of data pairs
n − 1 = Bessel correction for unbiased sample estimation
Population version: divide by n instead of (n − 1)
Computational form: Cov(X,Y) = (Σxy − n · x̄ · ȳ) ÷ (n − 1)
Example: X = [2, 4, 6], Y = [1, 5, 3], x̄ = 4, ȳ = 3. Cross-deviations: (2-4)(1-3) + (4-4)(5-3) + (6-4)(3-3) = 4 + 0 + 0 = 4. Sample Cov = 4 / (3-1) = 2.0

๐Ÿ“– How to Use This Calculator

Steps

1
Enter your X dataset - Type or paste your X values into the first text area, separated by commas or spaces. Each value corresponds to one observation.
2
Enter your Y dataset - Type or paste the matching Y values into the second text area in the same order as X. The number of Y values must equal the number of X values.
3
Click Calculate - Click the Calculate button to see sample covariance, population covariance, the means of X and Y, and a full deviation product table.
4
Switch to Summary Stats mode if needed - If you already have n, SumX, SumY, and SumXY from a textbook or published study, switch to Summary Stats mode and enter those four values directly.

๐Ÿ’ก Example Calculations

Example 1 - Study Hours vs Exam Score

Five students: hours studied (X) and exam score (Y)

1
X = [2, 4, 6, 8, 10], Y = [50, 60, 70, 80, 90]. Mean X = 6, Mean Y = 70.
2
Cross-deviations: (2-6)(50-70) + (4-6)(60-70) + (6-6)(70-70) + (8-6)(80-70) + (10-6)(90-70) = (-4)(-20) + (-2)(-10) + 0 + (2)(10) + (4)(20) = 80 + 20 + 0 + 20 + 80 = 200.
3
Sample Cov = 200 / (5-1) = 50.0. Population Cov = 200 / 5 = 40.0. Positive covariance confirms that more study hours correlate with higher scores.
Sample Covariance = 50.000000
Try this example →

Example 2 - Temperature vs Ice Cream Sales (Negative Relationship)

Coffee sales (Y) vs temperature Celsius (X): higher temperature, lower hot coffee demand

1
X (temperature) = [5, 10, 15, 20, 25], Y (coffees sold) = [120, 100, 80, 60, 40]. Mean X = 15, Mean Y = 80.
2
Cross-deviations: (5-15)(120-80) + (10-15)(100-80) + (15-15)(80-80) + (20-15)(60-80) + (25-15)(40-80) = (-10)(40) + (-5)(20) + 0 + (5)(-20) + (10)(-40) = -400 - 100 + 0 - 100 - 400 = -1000.
3
Sample Cov = -1000 / (5-1) = -250.0. The strong negative covariance confirms that as temperature rises, hot coffee sales fall.
Sample Covariance = -250.000000
Try this example →

Example 3 - Summary Stats Mode (Textbook Problem)

Given: n = 6, SumX = 42, SumY = 54, SumXY = 396

1
Mean X = SumX / n = 42 / 6 = 7.0. Mean Y = SumY / n = 54 / 6 = 9.0.
2
Sum of products = SumXY - n * MeanX * MeanY = 396 - 6 * 7 * 9 = 396 - 378 = 18.
3
Sample Cov = 18 / (6-1) = 3.6. Population Cov = 18 / 6 = 3.0.
Sample Covariance = 3.600000
Try this example →

โ“ Frequently Asked Questions

What is covariance and what does it tell you about two variables?+
Covariance tells you whether two variables tend to move together or apart. A positive covariance means that above-average values of X tend to pair with above-average values of Y. A negative covariance means they move in opposite directions. A covariance near zero suggests no consistent linear relationship. The raw number is hard to interpret without context because it depends on the units of measurement.
What is the formula for sample covariance?+
Sample covariance is Cov(X,Y) = Sum((xi - x-mean)(yi - y-mean)) / (n-1). The n-1 in the denominator (Bessel correction) produces an unbiased estimate of the true population covariance from a sample. The equivalent computational formula is Cov(X,Y) = (n*SumXY - SumX*SumY) / (n*(n-1)), which can be more numerically stable for large datasets.
When should I use sample covariance vs population covariance?+
Use sample covariance (n-1 denominator) when your data is a sample drawn from a larger population. This is the most common case in practice: survey data, experiment results, historical price data for a period. Use population covariance (n denominator) only when your data covers the entire population with no sampling involved, such as the grades of every student in a specific class with no intent to generalize beyond that class.
How is covariance different from the correlation coefficient?+
Covariance and correlation both measure the linear relationship between two variables, but correlation is dimensionless and bounded between -1 and +1. The Pearson correlation r = Cov(X,Y) / (Sx * Sy), where Sx and Sy are the standard deviations of X and Y. Dividing by the SDs removes the influence of measurement scale. A covariance of 500 (cm*kg) is difficult to interpret and cannot be compared to a covariance of 0.003 (m*ton), but r can be compared directly across datasets.
Can covariance be greater than 1 or less than -1?+
Yes. Unlike the correlation coefficient, covariance has no fixed bounds. It can be any real number, positive or negative, including values with very large absolute magnitudes. For example, if X is in millions of dollars and Y is also in millions of dollars, covariance will be in units of millions-squared. This is why comparing raw covariance values across different datasets or variable scales is meaningless without normalisation.
What does zero covariance mean?+
Zero covariance means there is no linear relationship between X and Y. However, it does not mean the variables are statistically independent. A classic counterexample: if Y = X squared and the X values are symmetric around zero (e.g. -2, -1, 0, 1, 2), the covariance is exactly zero even though Y is completely determined by X. Independence implies zero covariance, but zero covariance does not imply independence.
How is covariance used in portfolio theory?+
In Markowitz portfolio theory, the variance of a two-asset portfolio is Var(P) = w1^2 * Var(X) + w2^2 * Var(Y) + 2 * w1 * w2 * Cov(X, Y). A negative or low covariance between two assets reduces overall portfolio variance. This is the mathematical foundation of diversification: mixing assets with different return patterns (low or negative covariance) lowers risk without necessarily lowering expected return.
What is a covariance matrix and how is it used?+
A covariance matrix is a square matrix where entry (i, j) is the covariance between variable i and variable j. The diagonal entries are the variances (each variable's covariance with itself). The matrix is always symmetric. Covariance matrices are central to multivariate statistics, including principal component analysis (PCA), linear discriminant analysis (LDA), Gaussian mixture models, and Kalman filters for tracking and prediction.
Is covariance affected by outliers?+
Yes, covariance is sensitive to outliers. A single extreme pair (xi, yi) with large deviations from both means can dominate the sum of cross-products and substantially shift the covariance value. If you suspect outliers, inspect a scatter plot first and consider whether to include or exclude the extreme points. Robust alternatives include Spearman correlation (rank-based) or trimmed covariance estimators.
Why does the deviation product table show five columns?+
The five columns are: x (each X value), y (each Y value), (x - x-mean) (deviation of X from its mean), (y - y-mean) (deviation of Y from its mean), and their product (x - x-mean)(y - y-mean). Summing the last column gives the total sum of cross-deviations. Dividing by n-1 gives sample covariance. This step-by-step breakdown is useful for understanding the formula, checking manual calculations, and explaining results in reports or homework.
How do I interpret the Sum of Products shown in the results?+
The Sum of Products (shown as Sum of cross-deviations) is the raw numerator before dividing by n-1 or n. It equals Sum((xi - x-mean)(yi - y-mean)). This is the total joint deviation across all pairs. Dividing by n-1 gives sample covariance; dividing by n gives population covariance. The Sum of Products itself is useful in regression calculations and in multi-variable analyses where you need intermediate statistics.