Least Squares Regression Line Calculator

Enter paired x, y data to find the best-fit line, or predict y from a known regression equation.

📈 Least Squares Regression Line Calculator

Enter paired x, y data to find the best-fit line ŷ = mx + b, or use Predict mode to forecast y for any x.

Regression Equation
Slope (m)
Intercept (b)
Correlation (r)
Mean X
Mean Y
Data Points (n)
Predicted ŷ
X Value Used
Equation Used

📈 What is the Least Squares Regression Line?

The least squares regression line (also called the line of best fit) is the unique straight line ŷ = mx + b that minimises the sum of the squared vertical distances from each observed data point to the line. These vertical distances are called residuals, and squaring them ensures that positive and negative deviations both contribute positively to the total. By choosing the slope m and intercept b to make this sum as small as possible, least squares produces the line that best summarises the linear trend in the data.

The method was developed independently by Carl Friedrich Gauss and Adrien-Marie Legendre in the late 18th century. Gauss used it to predict the orbit of the dwarf planet Ceres from a limited number of astronomical observations, successfully locating it after it disappeared behind the sun. Since then, least squares regression has become one of the most widely used methods in data analysis, statistics, econometrics, engineering, and the natural sciences.

The regression line has two key properties. First, it always passes through the mean point (x̄, ȳ): substituting the mean x into the equation gives exactly the mean y. Second, the sum of all residuals is zero: the positive and negative deviations cancel out. These properties confirm that the line is centred on the data in a precise mathematical sense.

This calculator accepts x and y data in the Regression mode and computes the slope m, intercept b, Pearson correlation coefficient r, R² (coefficient of determination), mean values, and a full residuals table. Predict mode lets you enter a slope and intercept directly and forecast y for any x value.

📐 Formula

ŷ = mx + b
m (slope) = (n·Σxy − Σx·Σy) / (n·Σx² − (Σx)²)
b (intercept) = ȳ − m·x̄, where x̄ = Σx/n and ȳ = Σy/n
r (Pearson correlation) = (n·Σxy − Σx·Σy) / √[(n·Σx² − (Σx)²)(n·Σy² − (Σy)²)]
= 1 − SSres / SStot, where SSres = Σ(y − ŷ)² and SStot = Σ(y − ȳ)²
Residual = y − ŷ (observed minus predicted)
n = number of data points

📖 How to Use This Calculator

Steps

1
Enter X values: type all x coordinates in the X box, separated by commas, spaces, or new lines. Example: 1, 2, 3, 4, 5.
2
Enter Y values: type all y coordinates in the Y box in the same order. The count must match the X list exactly.
3
Click Calculate: the calculator outputs the regression equation, slope, intercept, r, R², mean values, and a residuals table showing each observed and predicted y with the difference.
4
Interpret R²: values close to 1 mean strong linear fit; values close to 0 mean little linear relationship. Check the residuals table for large outliers that may distort the line.
5
Predict mode: enter slope m, intercept b (from any source), and an x value. The calculator instantly returns ŷ = mx + b.

💡 Example Calculations

Example 1: Classic textbook data set

x = 1,2,3,4,5 and y = 2,4,5,4,5

1
n = 5. Σx = 15, Σy = 20, Σxy = 63, Σx² = 55, Σy² = 86.
2
m = (5×63 − 15×20) / (5×55 − 15²) = (315 − 300) / (275 − 225) = 15/50 = 0.3.
3
b = ȳ − m·x̄ = 4 − 0.3×3 = 4 − 0.9 = 3.1.
4
r ≈ 0.8165, R² ≈ 0.6667. The line explains 67% of the y variation.
Regression line: ŷ = 0.3x + 3.1
Try this example →

Example 2: Perfect linear relationship

x = 0,1,2,3 and y = 1,3,5,7

1
Each pair satisfies y = 2x + 1 exactly. The residuals should all be zero.
2
m = 2, b = 1. All residuals = 0. r = 1.0, R² = 1.0.
3
A perfect fit: 100% of the variation in y is explained by x.
Regression line: ŷ = 2x + 1 (perfect fit, R² = 1)
Try this example →

Example 3: Negative slope (inverse relationship)

x = 1,2,3,4,5 and y = 10,8,6,4,2

1
Y decreases as x increases. Expect a negative slope.
2
m = −2, b = 12. Equation: ŷ = −2x + 12.
3
r = −1.0, R² = 1.0 (perfect inverse linear relationship).
Regression line: ŷ = −2x + 12
Try this example →

Example 4: Predict using a known line

Regression line ŷ = 0.6x + 2.2, predict at x = 10

1
Switch to Predict mode. Enter slope = 0.6, intercept = 2.2, x = 10.
2
ŷ = 0.6 × 10 + 2.2 = 6.0 + 2.2 = 8.2.
Predicted ŷ at x = 10: 8.2
Try this example →

❓ Frequently Asked Questions

What is the least squares regression line?+
The least squares regression line is the straight line ŷ = mx + b that minimises the sum of the squared residuals — the squared vertical distances from each data point to the line. It is the best linear fit to the data in the sense that no other line achieves a smaller sum of squared errors. The method of least squares ensures the line is centred on the data and passes through the mean point (x̄, ȳ).
How is the slope of the regression line calculated?+
The slope formula is m = (n·Σxy − Σx·Σy) / (n·Σx² − (Σx)²). Here n is the sample size, Σxy is the sum of all products xi·yi, Σx is the sum of x values, Σy is the sum of y values, and Σx² is the sum of squared x values. Once m is found, the intercept is b = ȳ − m·x̄ where x̄ and ȳ are the sample means.
What does the correlation coefficient r mean?+
The Pearson correlation coefficient r measures the strength and direction of the linear relationship between x and y. It ranges from −1 to +1. A value near +1 means strong positive linear association (y increases with x). A value near −1 means strong negative linear association. A value near 0 means little or no linear relationship. The sign of r always matches the sign of the slope m.
What is R² and how is it different from r?+
R² (the coefficient of determination) equals r squared for simple linear regression. It measures the proportion of total variation in y explained by the linear model: R² = 1 − (SS_res / SS_tot). An R² of 0.75 means the regression line accounts for 75% of the variability in y. Unlike r, R² is always between 0 and 1 and does not carry sign information.
What is a residual?+
A residual for a data point (xi, yi) is the difference between the observed value and the predicted value: residual = yi − ŷi = yi − (m·xi + b). Positive residuals correspond to points above the line; negative residuals to points below. The sum of all residuals equals zero for a least squares line, and the sum of squared residuals is the minimum possible among all straight lines.
Does the regression line always pass through the mean point?+
Yes. The least squares regression line always passes through (x̄, ȳ). This follows from the intercept formula b = ȳ − m·x̄: substituting x = x̄ gives ŷ = m·x̄ + (ȳ − m·x̄) = ȳ. This is a fundamental property of least squares and a useful way to verify a hand calculation: check that plugging in the mean x returns the mean y.
What happens if all x values are the same?+
If all x values are equal, the denominator n·Σx² − (Σx)² equals zero, making the slope undefined. No line in the form ŷ = mx + b can be fitted because the data is a vertical cluster. The calculator will show an error message in this case. Regression requires variability in x to compute a meaningful slope.
How many data points are needed?+
At least 2 data points are required (two points determine a unique line, giving R² = 1 trivially). For meaningful statistical inference, aim for at least 10 observations. With very small samples the slope and intercept estimates are highly variable. This calculator will compute the regression for any n ≥ 2 but the residuals table helps you judge whether the fit is genuinely linear or coincidental.
Is regression the same as correlation?+
No, though they are related. Correlation (measured by r) describes the strength of a linear relationship without implying a specific prediction equation. Regression produces a formula ŷ = mx + b for predicting y from x. In regression, x and y have different roles (x is the predictor, y is the response). In correlation, the two variables are symmetric — swapping x and y gives the same r but a different regression line.
Can regression be used to prove causation?+
No. Regression measures association, not causation. A strong regression fit (high R²) between two variables does not mean one causes the other. Both could be driven by a third confounding variable. Establishing causation requires controlled experiments or careful causal analysis. The classic example: ice cream sales and drowning rates are correlated (both rise in summer) but neither causes the other.
What is the difference between the regression of y on x and x on y?+
The regression of y on x minimises the sum of squared vertical residuals (errors in y) and is used to predict y from x. The regression of x on y minimises the sum of squared horizontal residuals (errors in x) and is used to predict x from y. Unless r = ±1, these give different lines because they minimise different quantities. For prediction, always regress the response variable on the predictor, not the other way around.