Learn why high training performance often masks poor real-world results. Discover how to compare training and testing error to master model generalization.
Previously in this course, we covered the training and testing data splits to ensure our evaluation process remains honest. Now, we will look at how to interpret the results of those splits to diagnose if your model is actually learning patterns or just memorizing noise.
In machine learning, your goal isn't to build a model that performs perfectly on the data it has already seen. Your goal is generalization: the ability of a model to perform accurately on new, unseen data.
When you train a model, you minimize a loss function (as discussed in Loss Functions and Model Objectives), which forces the model to adjust its internal parameters to fit the training set. However, a model can "cheat" by memorizing the noise, outliers, and specific quirks of the training data. This is why we distinguish between two types of error:
If your training error is near zero but your testing error is high, you have a generalization problem. The model has learned the training set by heart, but it has no "wisdom" to apply to the real world.
To identify if your model is generalizing, you must compare the metrics side-by-side. If you are building a regression model, you might look at Mean Squared Error (MSE). If you are building a classifier, you might look at accuracy.
Here is how to interpret the relationship between these two scores:
Let's look at a snippet of how you would compare these scores using Scikit-Learn. We assume you've already completed the Training the Baseline Linear Model lesson.
PYTHONfrom sklearn.metrics import mean_squared_error import numpy as np # Assuming CE9178">'model' is your fitted pipeline # CE9178">'X_train', CE9178">'y_train' are your training sets # CE9178">'X_test', CE9178">'y_test' are your testing sets train_preds = model.predict(X_train) test_preds = model.predict(X_test) train_mse = mean_squared_error(y_train, train_preds) test_mse = mean_squared_error(y_test, test_preds) print(f"Training MSE: {train_mse:.4f}") print(f"Testing MSE: {test_mse:.4f}") # The "Generalization Gap" gap = test_mse - train_mse print(f"Generalization Gap: {gap:.4f}")
If your gap is large, your model is likely failing to generalize. Just like Laravel Benchmark Helper helps you identify performance bottlenecks in code, comparing these two metrics is the "benchmark" for your model's reliability.
Using your project dataset from our previous lessons, calculate the performance metric (e.g., Accuracy or MSE) for both your training and testing sets.
Generalization is the ultimate measure of an ML model's success. By tracking both training error and testing error, you can catch overfitting before your model hits production. If the gap between them grows too large, it’s time to simplify your model or gather more representative data.
Up next: We will dive into Overfitting and Underfitting, where we learn how to balance bias and variance to shrink that generalization gap.
Learn to measure model accuracy with essential regression metrics. We break down RMSE, MAE, and R-squared so you can evaluate your predictions like a pro.
Read moreLearn how to evaluate model calibration using calibration curves and the Brier score. Ensure your predicted probabilities are accurate representations of reality.
Training Error vs Generalization Error