Least Squares Regression Line Calculator

Enter paired x, y data to find the best-fit line, or predict y from a known regression equation.

X Values

Y Values

Slope (m)

Intercept (b)

X value to predict

Regression Equation

—

Slope (m)

—

Intercept (b)

—

Correlation (r)

—

R²

—

Mean X

—

Mean Y

—

Data Points (n)

—

Predicted ŷ

—

X Value Used

—

Equation Used

—

📈 What is the Least Squares Regression Line?

The least squares regression line (also called the line of best fit) is the unique straight line ŷ = mx + b that minimises the sum of the squared vertical distances from each observed data point to the line. These vertical distances are called residuals, and squaring them ensures that positive and negative deviations both contribute positively to the total. By choosing the slope m and intercept b to make this sum as small as possible, least squares produces the line that best summarises the linear trend in the data.

The method was developed independently by Carl Friedrich Gauss and Adrien-Marie Legendre in the late 18th century. Gauss used it to predict the orbit of the dwarf planet Ceres from a limited number of astronomical observations, successfully locating it after it disappeared behind the sun. Since then, least squares regression has become one of the most widely used methods in data analysis, statistics, econometrics, engineering, and the natural sciences.

The regression line has two key properties. First, it always passes through the mean point (x̄, ȳ): substituting the mean x into the equation gives exactly the mean y. Second, the sum of all residuals is zero: the positive and negative deviations cancel out. These properties confirm that the line is centred on the data in a precise mathematical sense.

This calculator accepts x and y data in the Regression mode and computes the slope m, intercept b, Pearson correlation coefficient r, R² (coefficient of determination), mean values, and a full residuals table. Predict mode lets you enter a slope and intercept directly and forecast y for any x value.

📐 Formula

ŷ = mx + b

m (slope) = (n·Σxy − Σx·Σy) / (n·Σx² − (Σx)²)

b (intercept) = ȳ − m·x̄, where x̄ = Σx/n and ȳ = Σy/n

r (Pearson correlation) = (n·Σxy − Σx·Σy) / √[(n·Σx² − (Σx)²)(n·Σy² − (Σy)²)]

R² = 1 − SS_res / SS_tot, where SS_res = Σ(y − ŷ)² and SS_tot = Σ(y − ȳ)²

Residual = y − ŷ (observed minus predicted)

n = number of data points

📖 How to Use This Calculator

Steps

Enter X values: type all x coordinates in the X box, separated by commas, spaces, or new lines. Example: 1, 2, 3, 4, 5.

Enter Y values: type all y coordinates in the Y box in the same order. The count must match the X list exactly.

Click Calculate: the calculator outputs the regression equation, slope, intercept, r, R², mean values, and a residuals table showing each observed and predicted y with the difference.

Interpret R²: values close to 1 mean strong linear fit; values close to 0 mean little linear relationship. Check the residuals table for large outliers that may distort the line.

Predict mode: enter slope m, intercept b (from any source), and an x value. The calculator instantly returns ŷ = mx + b.

💡 Example Calculations

Example 1: Classic textbook data set

x = 1,2,3,4,5 and y = 2,4,5,4,5

n = 5. Σx = 15, Σy = 20, Σxy = 66, Σx² = 55, Σy² = 86.

m = (5×66 − 15×20) / (5×55 − 15²) = (330 − 300) / (275 − 225) = 30/50 = 0.6.

b = ȳ − m·x̄ = 4 − 0.6×3 = 4 − 1.8 = 2.2.

r ≈ 0.7746, R² = 0.6000. The line explains 60% of the y variation.

Regression line: ŷ = 0.6x + 2.2

Try this example →

Example 2: Perfect linear relationship

x = 0,1,2,3 and y = 1,3,5,7

Each pair satisfies y = 2x + 1 exactly. The residuals should all be zero.

m = 2, b = 1. All residuals = 0. r = 1.0, R² = 1.0.

A perfect fit: 100% of the variation in y is explained by x.

Regression line: ŷ = 2x + 1 (perfect fit, R² = 1)

Try this example →

Example 3: Negative slope (inverse relationship)

x = 1,2,3,4,5 and y = 10,8,6,4,2

Y decreases as x increases. Expect a negative slope.

m = −2, b = 12. Equation: ŷ = −2x + 12.

r = −1.0, R² = 1.0 (perfect inverse linear relationship).

Regression line: ŷ = −2x + 12

Try this example →

Example 4: Predict using a known line

Regression line ŷ = 0.6x + 2.2, predict at x = 10

Switch to Predict mode. Enter slope = 0.6, intercept = 2.2, x = 10.

ŷ = 0.6 × 10 + 2.2 = 6.0 + 2.2 = 8.2.

Predicted ŷ at x = 10: 8.2

Try this example →

❓ Frequently Asked Questions

What is the least squares regression line?+

The least squares regression line is the straight line ŷ = mx + b that minimises the sum of the squared residuals — the squared vertical distances from each data point to the line. It is the best linear fit to the data in the sense that no other line achieves a smaller sum of squared errors. The method of least squares ensures the line is centred on the data and passes through the mean point (x̄, ȳ).

How is the slope of the regression line calculated?+

The slope formula is m = (n·Σxy − Σx·Σy) / (n·Σx² − (Σx)²). Here n is the sample size, Σxy is the sum of all products xi·yi, Σx is the sum of x values, Σy is the sum of y values, and Σx² is the sum of squared x values. Once m is found, the intercept is b = ȳ − m·x̄ where x̄ and ȳ are the sample means.

What does the correlation coefficient r mean?+

The Pearson correlation coefficient r measures the strength and direction of the linear relationship between x and y. It ranges from −1 to +1. A value near +1 means strong positive linear association (y increases with x). A value near −1 means strong negative linear association. A value near 0 means little or no linear relationship. The sign of r always matches the sign of the slope m.

What is R² and how is it different from r?+

R² (the coefficient of determination) equals r squared for simple linear regression. It measures the proportion of total variation in y explained by the linear model: R² = 1 − (SS_res / SS_tot). An R² of 0.75 means the regression line accounts for 75% of the variability in y. Unlike r, R² is always between 0 and 1 and does not carry sign information.

What is a residual?+

A residual for a data point (xi, yi) is the difference between the observed value and the predicted value: residual = yi − ŷi = yi − (m·xi + b). Positive residuals correspond to points above the line; negative residuals to points below. The sum of all residuals equals zero for a least squares line, and the sum of squared residuals is the minimum possible among all straight lines.

Does the regression line always pass through the mean point?+

Yes. The least squares regression line always passes through (x̄, ȳ). This follows from the intercept formula b = ȳ − m·x̄: substituting x = x̄ gives ŷ = m·x̄ + (ȳ − m·x̄) = ȳ. This is a fundamental property of least squares and a useful way to verify a hand calculation: check that plugging in the mean x returns the mean y.

What happens if all x values are the same?+

If all x values are equal, the denominator n·Σx² − (Σx)² equals zero, making the slope undefined. No line in the form ŷ = mx + b can be fitted because the data is a vertical cluster. The calculator will show an error message in this case. Regression requires variability in x to compute a meaningful slope.

How many data points are needed?+

At least 2 data points are required (two points determine a unique line, giving R² = 1 trivially). For meaningful statistical inference, aim for at least 10 observations. With very small samples the slope and intercept estimates are highly variable. This calculator will compute the regression for any n ≥ 2 but the residuals table helps you judge whether the fit is genuinely linear or coincidental.

Is regression the same as correlation?+

No, though they are related. Correlation (measured by r) describes the strength of a linear relationship without implying a specific prediction equation. Regression produces a formula ŷ = mx + b for predicting y from x. In regression, x and y have different roles (x is the predictor, y is the response). In correlation, the two variables are symmetric — swapping x and y gives the same r but a different regression line.

Can regression be used to prove causation?+

No. Regression measures association, not causation. A strong regression fit (high R²) between two variables does not mean one causes the other. Both could be driven by a third confounding variable. Establishing causation requires controlled experiments or careful causal analysis. The classic example: ice cream sales and drowning rates are correlated (both rise in summer) but neither causes the other.

What is the difference between the regression of y on x and x on y?+

The regression of y on x minimises the sum of squared vertical residuals (errors in y) and is used to predict y from x. The regression of x on y minimises the sum of squared horizontal residuals (errors in x) and is used to predict x from y. Unless r = ±1, these give different lines because they minimise different quantities. For prediction, always regress the response variable on the predictor, not the other way around.

🔗 Related Calculators

What is the least squares regression line?

The least squares regression line, also called the line of best fit, is the straight line ŷ = mx + b that minimises the sum of the squared vertical distances from each data point to the line. These vertical distances are called residuals. By minimising the sum of their squares, least squares produces the line that gives the best overall prediction for the data.

How do you calculate the slope of the regression line?

The slope formula is m = (n·Σxy − Σx·Σy) / (n·Σx² − (Σx)²), where n is the number of data points, Σxy is the sum of products of paired x and y values, Σx and Σy are the sums of x and y, and Σx² is the sum of squared x values. Once the slope is known, the intercept is b = ȳ − m·x̄.

What does R² mean in regression?

R² (the coefficient of determination) measures what proportion of the variation in y is explained by the linear relationship with x. An R² of 0.85 means 85% of the variability in y is accounted for by the regression line. R² ranges from 0 to 1 for standard regression, with values closer to 1 indicating a better linear fit.

What is the difference between r and R²?

The correlation coefficient r measures the strength and direction of the linear relationship between x and y, ranging from −1 to +1. A negative r means the slope is negative; a positive r means the slope is positive. R² equals r squared and measures how much variation in y the line explains (always between 0 and 1, ignoring direction). For simple linear regression, R² = r².

What is a residual in regression?

A residual is the difference between an observed y value and the value predicted by the regression line: residual = y − ŷ. Positive residuals mean the point is above the line; negative residuals mean it is below. The least squares method minimises the sum of squared residuals. A residual table (shown by this calculator) lists each data point's residual, helping identify outliers.

Does the regression line always pass through the mean?

Yes. The least squares regression line always passes through the point (x̄, ȳ), the means of x and y. This is a mathematical property of the formulas: substituting x = x̄ into ŷ = mx̄ + b = mx̄ + (ȳ − mx̄) = ȳ confirms it. If the predicted value at the mean x does not equal the mean y, something is wrong with the calculation.

How do you predict y from the regression line?

Substitute the desired x value into the regression equation: ŷ = mx + b. For example, if the regression line is ŷ = 2.5x + 1.3 and you want to predict at x = 4, then ŷ = 2.5(4) + 1.3 = 11.3. Use the Predict mode in this calculator to do this instantly once you know the slope and intercept.

What are the assumptions of least squares regression?

The main assumptions are: (1) linearity — the true relationship between x and y is linear; (2) independence — observations are not correlated with each other; (3) equal variance (homoscedasticity) — the spread of residuals is roughly constant across all x values; and (4) normality of residuals — the residuals follow a roughly normal distribution. Violations of these assumptions affect the reliability of predictions and inference.

Can regression be used for prediction outside the data range?

Extrapolation (predicting y for x values outside the observed range) is risky. The regression line is only guaranteed to describe the relationship within the range of the observed data. The relationship may not remain linear, may level off, or may reverse outside that range. Treat extrapolated predictions with caution and consider whether they make practical sense.

What does a negative slope mean?

A negative slope means y tends to decrease as x increases. For example, a negative slope in a regression of temperature (x) on ice cream sales (y) would be unusual; but a negative slope in altitude (x) versus air pressure (y) means higher altitude means lower pressure. The sign of the slope and correlation coefficient r always agree.

How many data points are needed for regression?

At least 2 data points are required to fit a line (two points determine a line exactly, with r = ±1 and R² = 1). However, meaningful regression for prediction requires more — typically 10 or more points — to get stable estimates of slope and intercept. With very few points the line can appear to fit well by chance. This calculator accepts as few as 2 points but shows residuals so you can judge the fit quality.

What is the line of best fit used for in real life?

The least squares regression line is used widely: in economics to forecast sales or GDP from indicator variables, in biology to study dose-response relationships, in physics to extract constants from experimental data (such as fitting Ohm's law to voltage-current measurements), in medicine to calibrate screening tests, and in machine learning as the foundation of linear regression models used for prediction.

📌 Quick Tips

💡Enter X values in one box and Y values in the other, separated by commas or spaces. Both lists must have the same number of values.

💡An R² of 1.0 means the line fits the data perfectly. An R² near 0 means the line explains almost none of the variation.

💡The regression line always passes through the mean point (x̄, ȳ). Check that your predicted value at x = x̄ equals ȳ.

💡If all X values are the same, no slope can be determined and the calculator will show an error.

💡Positive slope means y increases with x; negative slope means y decreases with x.

Least Squares Regression Line Calculator

📈 What is the Least Squares Regression Line?

📐 Formula

📖 How to Use This Calculator

Steps

💡 Example Calculations

Example 1: Classic textbook data set

x = 1,2,3,4,5 and y = 2,4,5,4,5

Example 2: Perfect linear relationship

x = 0,1,2,3 and y = 1,3,5,7

Example 3: Negative slope (inverse relationship)

x = 1,2,3,4,5 and y = 10,8,6,4,2

Example 4: Predict using a known line

Regression line ŷ = 0.6x + 2.2, predict at x = 10

❓ Frequently Asked Questions

🔗 Related Calculators

What is the least squares regression line?

How do you calculate the slope of the regression line?

What does R² mean in regression?

What is the difference between r and R²?

What is a residual in regression?

Does the regression line always pass through the mean?

How do you predict y from the regression line?

What are the assumptions of least squares regression?

Can regression be used for prediction outside the data range?

What does a negative slope mean?

How many data points are needed for regression?

What is the line of best fit used for in real life?

📌 Quick Tips

How helpful did you find the calculator?

Tell us more

Thank you!