Matthews Correlation Coefficient Calculator
Calculate the Matthews Correlation Coefficient from a confusion matrix or binary label arrays, with a full suite of classification metrics.
๐งฎ What is the Matthews Correlation Coefficient?
The Matthews Correlation Coefficient (MCC) is a measure of the quality of binary classification models. It was introduced by biochemist Brian Matthews in 1975 to evaluate predictions of protein secondary structure and has since been adopted as a gold-standard metric in machine learning, medical diagnostics, bioinformatics, and any domain where binary classification quality needs to be measured rigorously. The MCC ranges from -1 to +1: a value of +1 represents a perfect classifier, 0 represents a classifier no better than random guessing, and -1 represents a perfectly inverted classifier that always predicts the wrong class.
MCC is used across a wide range of real-world applications. In machine learning, it evaluates models for spam detection, fraud detection, disease diagnosis, and churn prediction. In medicine, it measures the quality of diagnostic tests against ground truth labels (positive/negative for a condition). In bioinformatics, it benchmarks gene expression classifiers and protein structure predictions. In software testing, it assesses defect prediction models. Unlike simpler metrics, MCC is especially valuable when class distributions are highly imbalanced, because it incorporates all four cells of the confusion matrix rather than focusing only on one class or one type of error.
A common misconception is that accuracy is sufficient to evaluate a binary classifier. On a dataset with 99% negative samples, a model that always predicts negative achieves 99% accuracy despite zero predictive ability. Its MCC, however, equals 0, correctly signalling no predictive correlation. Another misconception is that the F1 score is equivalent to MCC. F1 ignores True Negatives entirely, which makes it blind to the model's performance on the negative class. MCC penalises poor performance on either class symmetrically, making it strictly more informative than both accuracy and F1 for imbalanced problems.
This calculator accepts two input formats: a confusion matrix (TP, FP, FN, TN counts) for when you already have aggregated results, and raw binary label arrays for when you have lists of actual and predicted values. Both formats compute the same 10-metric output: MCC, accuracy, balanced accuracy, precision (PPV), recall (sensitivity), specificity (TNR), negative predictive value (NPV), F1 score, and Cohen's kappa.
๐ MCC Formula
The numerator TP x TN - FP x FN measures the difference between correct and incorrect predictions in a balanced way across both classes. The denominator normalises this difference by the geometric mean of the four marginal totals of the confusion matrix, ensuring the result lies in [-1, +1] regardless of class balance or total sample count. The formula is equivalent to the Pearson product-moment correlation coefficient applied to two binary variables (actual and predicted labels coded as 0 and 1).