Error Analysis Plots: Diagnosing Model Failures in Python

Master error analysis plots to move beyond aggregate metrics. Learn to visualize residuals for regression and classification errors to find model blind spots.

machine learningerror analysisdata visualizationpythonscikit-learnregressionaimachine-learning

Previously in this course, we discussed Regression Evaluation Metrics and the Confusion Matrix, which provide high-level summaries of how well your model performs. However, aggregate numbers like RMSE or accuracy hide the why behind a failure.

Error analysis is the diagnostic phase of machine learning where we look at individual predictions to understand where the model struggles. If your model is failing, it's rarely failing everywhere equally; it’s usually failing on specific subsets of data. Today, we’ll use error analysis and visualization to uncover those patterns.

Visualizing Regression Residuals

A residual is simply the difference between the actual value and the predicted value: $residual = y_{actual} - y_{predicted}$. If your model were perfect, all residuals would be zero.

A residual plot displays the predicted values on the x-axis and the residuals on the y-axis. In a well-behaved linear model, you want to see a random "cloud" of points centered around zero. If you see a pattern—like a U-shape or a funnel—it means your model is failing to capture a systematic relationship in the data.

Worked Example: Residual Plotting

Using our ongoing project dataset, let’s visualize the residuals to see if our linear model is missing non-linear patterns.


PYTHON
import matplotlib.pyplot as plt
import numpy as np

# Assuming CE9178">'y_test' and CE9178">'y_pred' are your numpy arrays
residuals = y_test - y_pred

plt.figure(figsize=(8, 5))
plt.scatter(y_pred, residuals, alpha=0.5)
plt.axhline(y=0, color=CE9178">'r', linestyle=CE9178">'--')
plt.xlabel("Predicted Values")
plt.ylabel("Residuals")
plt.title("Residual Plot: Checking for Systematic Bias")
plt.show()

If the points form a megaphone shape (getting wider as predictions increase), this indicates heteroscedasticity—your model's error variance is not constant, often because it's failing to account for the increasing scale of your target variable.

Visualizing Classification Errors

For classification, we don't have residuals, but we do have "false" predictions. We want to know: What do the examples the model gets wrong have in common?

If you are building a binary classifier, take your test set and filter for the rows where the model was wrong (False Positives and False Negatives). Then, compare the distribution of features for these "error" rows against the rest of the dataset.

Worked Example: Error Pattern Inspection


PYTHON
# Assuming CE9178">'X_test' is a DataFrame and CE9178">'y_test', CE9178">'y_pred' are arrays
errors = X_test[y_test != y_pred]
correct = X_test[y_test == y_pred]

# Compare the mean of a specific feature for errors vs correct predictions
print(f"Mean of feature CE9178">'Age' in errors: {errors[CE9178">'Age'].mean()}")
print(f"Mean of feature CE9178">'Age' in correct: {correct[CE9178">'Age'].mean()}")

If the mean "Age" is significantly different in your error set, you’ve found a blind spot. Perhaps your model performs poorly on older individuals because they are underrepresented in the training data.

Hands-on Exercise

Take the model you trained in Training the Baseline Linear Model.
Generate a residual plot for your test set.
Identify the 5% of data points with the largest absolute residuals. These are your "worst" predictions.
Print the features of these 5 rows. Do they share a common trait (e.g., all have a specific categorical value)?

Common Pitfalls

Ignoring the Scale: If your residuals are massive, check if your target variable needs scaling or if you have extreme outliers. A few massive outliers can skew your entire plot, making it impossible to see the behavior of the majority of your data.
Assuming Randomness: Don't assume that because your residual plot looks "okay," the model is done. Always look for clusters. If you see a cluster of points far from the zero line, your model is likely missing a feature that defines that specific cluster.
Data Leakage in Analysis: Never perform error analysis on your training set to "tune" your features. You must perform this analysis on your test set (or a dedicated validation set) to ensure you are seeing how the model generalizes to unseen data.

Recap

Aggregate metrics are your starting point, but error analysis is your compass. By plotting residuals, you can spot systematic bias in regression. By isolating and profiling classification errors, you can identify specific segments of your population where the model is failing. Visualization turns abstract loss numbers into actionable insights about your data.

Up next: We will dive into Introduction to Cross-Validation to ensure our error estimates are robust and not just a fluke of a single train-test split.

Back to Blog

Error Analysis Plots: Diagnosing Model Failures in Python

Visualizing Regression Residuals

Worked Example: Residual Plotting

Visualizing Classification Errors

Worked Example: Error Pattern Inspection

Hands-on Exercise

Common Pitfalls

Recap

Similar Posts

Mastering Regression Evaluation Metrics: RMSE, MAE, and R-squared

Advanced Hyperparameter Search: Beyond Grid Search

Model Interpretability Basics: Coefficients and SHAP Explained