Stop relying on accuracy alone. Learn how to generate a confusion matrix to identify true positives and false negatives, the keys to real error analysis.
Previously in this course, we covered the mechanics of classification and discussed how models define decision boundaries. Now that you have a model capable of making predictions, you need a way to look under the hood. Accuracy is a dangerous metric; it hides the types of mistakes your model is making.
In this lesson, we introduce the confusion matrix, the essential tool for error analysis in any classification task.
A confusion matrix is a table that maps your model's predictions against the actual ground-truth labels. It transforms a simple list of "correct" or "incorrect" guesses into a detailed breakdown of performance.
For a binary classification problem—where you are predicting between two classes (e.g., "Spam" or "Not Spam")—the matrix is a 2x2 grid. It organizes your results into four specific categories:
By visualizing these four buckets, you stop asking "How often was I right?" and start asking "What kind of mistakes am I making?"
You don't need to count these manually. Scikit-learn provides a utility to generate this matrix instantly.
Assuming you have already completed the training and testing data splits and have your model predictions, here is how you generate the matrix:
PYTHONfrom sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay import matplotlib.pyplot as plt # Assuming CE9178">'y_test' are your actual labels and CE9178">'y_pred' are your model's guesses cm = confusion_matrix(y_test, y_pred) # Display it visually disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=["Not Spam", "Spam"]) disp.plot(cmap=plt.cm.Blues) plt.show()
When you run this, you will see a grid where the diagonal from top-left to bottom-right represents your correct predictions (TN and TP), while the off-diagonal cells represent your errors (FP and FN).
For our project dataset, we are currently working with a classification model. If you haven't reached the stage of training the baseline linear model yet, do so now.
confusion_matrix for your current project model using your test set.(y_true, y_pred). If you swap these, your FP and FN values will be inverted, leading you to misinterpret your model's behavior. Always check the labels on your plot.The confusion matrix is your primary tool for error analysis. It moves your evaluation from abstract percentages to concrete, actionable insights. By categorizing your classification results into TPs, TNs, FPs, and FNs, you gain the clarity needed to decide how to improve your model—whether by collecting more data, adjusting your decision threshold, or changing your features.
Up next: We will move from the matrix to visual tools in Error Analysis Plots to identify patterns in where your model struggles most.
Master feature selection with RFECV. Learn how to automate the removal of noisy, irrelevant features to build simpler, more robust machine learning models.
Read moreLearn how to boost your model's performance by combining multiple learners. We cover voting, bagging, and how Random Forest delivers robust predictions.
The Confusion Matrix