Mahamudul Hasan Rubel
HomeAboutProjectsSkillsExperienceBlogCoursesPhotosContact
Mahamudul Hasan Rubel

Senior Software Engineer crafting high-performance web applications and SaaS platforms.

Navigation

  • Home
  • About
  • Projects
  • Skills
  • Experience
  • Blog
  • Courses
  • Photos
  • Contact

Get in Touch

Available for senior/lead roles and consulting.

bd.mhrubel@gmail.comHire Me

© 2026 Mahamudul Hasan Rubel. All rights reserved.

Built with using Next.js 16 & Tailwind v4

Back to Blog
Lesson 17 of the AI/ML Foundations: Core Concepts & First Models course
AI/MLJune 25, 20264 min read

Training Error vs Generalization Error: A Practical Guide

Learn why high training performance often masks poor real-world results. Discover how to compare training and testing error to master model generalization.

machine learninggeneralizationmodel evaluationdata sciencescikit-learnaimachine-learningpython

Previously in this course, we covered the training and testing data splits to ensure our evaluation process remains honest. Now, we will look at how to interpret the results of those splits to diagnose if your model is actually learning patterns or just memorizing noise.

The Core Problem: Memorization vs. Learning

In machine learning, your goal isn't to build a model that performs perfectly on the data it has already seen. Your goal is generalization: the ability of a model to perform accurately on new, unseen data.

When you train a model, you minimize a loss function (as discussed in Loss Functions and Model Objectives), which forces the model to adjust its internal parameters to fit the training set. However, a model can "cheat" by memorizing the noise, outliers, and specific quirks of the training data. This is why we distinguish between two types of error:

  • Training Error: The model's performance on the data used to train it.
  • Generalization Error (Testing Error): The model's performance on data it has never encountered.

If your training error is near zero but your testing error is high, you have a generalization problem. The model has learned the training set by heart, but it has no "wisdom" to apply to the real world.

Comparing Performance Metrics

To identify if your model is generalizing, you must compare the metrics side-by-side. If you are building a regression model, you might look at Mean Squared Error (MSE). If you are building a classifier, you might look at accuracy.

Here is how to interpret the relationship between these two scores:

  1. Low Training Error + Low Testing Error: This is the ideal state. The model has captured the underlying pattern.
  2. Low Training Error + High Testing Error: This is the classic sign of overfitting. The model is too complex and has memorized the training data's noise.
  3. High Training Error + High Testing Error: This is a sign of underfitting. The model is too simple or the data lacks the necessary features to make a prediction.

Worked Example: Identifying the Gap

Let's look at a snippet of how you would compare these scores using Scikit-Learn. We assume you've already completed the Training the Baseline Linear Model lesson.

PYTHON
from sklearn.metrics import mean_squared_error
import numpy as np

# Assuming CE9178">'model' is your fitted pipeline
# CE9178">'X_train', CE9178">'y_train' are your training sets
# CE9178">'X_test', CE9178">'y_test' are your testing sets

train_preds = model.predict(X_train)
test_preds = model.predict(X_test)

train_mse = mean_squared_error(y_train, train_preds)
test_mse = mean_squared_error(y_test, test_preds)

print(f"Training MSE: {train_mse:.4f}")
print(f"Testing MSE: {test_mse:.4f}")

# The "Generalization Gap"
gap = test_mse - train_mse
print(f"Generalization Gap: {gap:.4f}")

If your gap is large, your model is likely failing to generalize. Just like Laravel Benchmark Helper helps you identify performance bottlenecks in code, comparing these two metrics is the "benchmark" for your model's reliability.

Hands-on Exercise

Using your project dataset from our previous lessons, calculate the performance metric (e.g., Accuracy or MSE) for both your training and testing sets.

  1. Run your current model and store the predictions for both datasets.
  2. Calculate the error for each.
  3. Ask yourself: Is the difference between the two significant? If the training score is 95% and the testing score is 70%, write down one reason why you think this gap exists (e.g., "The model is too complex for the small amount of data").

Common Pitfalls

  • Using the test set for hyperparameter tuning: If you tweak your model based on the test set, you are effectively "leaking" information from the test set into your training process. This invalidates your final performance metrics.
  • Ignoring data leakage: Sometimes, features in your training data contain information that won't be available at prediction time, leading to artificially low training error.
  • Confusing small data with small error: On very small datasets, it is extremely easy to get high training scores, but this rarely translates to real-world performance.

Recap

Generalization is the ultimate measure of an ML model's success. By tracking both training error and testing error, you can catch overfitting before your model hits production. If the gap between them grows too large, it’s time to simplify your model or gather more representative data.

Up next: We will dive into Overfitting and Underfitting, where we learn how to balance bias and variance to shrink that generalization gap.

Previous lessonTraining the Baseline Linear ModelNext lesson Overfitting and Underfitting
Back to Blog

Similar Posts

AI/MLJune 25, 20264 min read

Mastering Regression Evaluation Metrics: RMSE, MAE, and R-squared

Learn to measure model accuracy with essential regression metrics. We break down RMSE, MAE, and R-squared so you can evaluate your predictions like a pro.

Read more
AI/MLJune 25, 20264 min read

Evaluating Model Calibration: Accuracy Beyond Just Predictions

Learn how to evaluate model calibration using calibration curves and the Brier score. Ensure your predicted probabilities are accurate representations of reality.

Part of the course

AI/ML Foundations: Core Concepts & First Models

beginner · Lesson 17 of 50

  1. 1

    The Machine Learning Workflow

    4 min
  2. 2

    Setting Up the Python ML Environment

    4 min
  3. 3

    Introduction to NumPy for Data Handling

    4 min
Read more
AI/MLJune 25, 20264 min read

Model Interpretability Basics: Coefficients and SHAP Explained

Learn how to demystify your models using linear coefficients and SHAP values. Understand why transparency is essential for trust and debugging in production.

Read more
4

Loading and Inspecting Datasets with Pandas

3 min
  • 5

    Exploratory Data Analysis Fundamentals

    3 min
  • 6

    Handling Missing and Inconsistent Data

    3 min
  • 7

    Feature Selection and Basic Filtering

    3 min
  • 8

    Project Dataset Initialization

    3 min
  • 9

    Mechanics of Linear Regression

    4 min
  • 10

    Mechanics of Classification

    4 min
  • 11

    Loss Functions and Model Objectives

    4 min
  • 12

    Training and Testing Data Splits

    3 min
  • 13

    Data Scaling Techniques

    4 min
  • 14

    Encoding Categorical Variables

    3 min
  • 15

    Building Scikit-Learn Pipelines

    4 min
  • 16

    Training the Baseline Linear Model

    3 min
  • 17

    Training Error vs Generalization Error

    4 min
  • 18

    Overfitting and Underfitting

    4 min
  • 19

    Regression Evaluation Metrics

    4 min
  • 20

    The Confusion Matrix

    3 min
  • 21

    Error Analysis Plots

    4 min
  • 22

    Introduction to Cross-Validation

    4 min
  • 23

    Diagnosing Model Weaknesses

    3 min
  • 24

    Feature Engineering Strategies

    4 min
  • 25

    Handling Outliers

    3 min
  • 26

    The Bias-Variance Tradeoff

    3 min
  • 27

    Hyperparameter Tuning Basics

    4 min
  • 28

    Implementing Grid Search

    3 min
  • 29

    Refining the Project Model

    3 min
  • 30

    Evaluating Feature Importance

    3 min
  • 31

    Advanced Feature Transformation

    3 min
  • 32

    Regularization Techniques

    3 min
  • 33

    Comparing Different Algorithms

    3 min
  • 34

    Managing Model Complexity

    4 min
  • 35

    Understanding Data Drift

    4 min
  • 36

    Version Control for ML Experiments

    3 min
  • 37

    Exporting Trained Models

    3 min
  • 38

    Creating an Inference Script

    3 min
  • 39

    Building a Simple Web Interface

    3 min
  • 40

    Documenting ML Projects

    4 min
  • 41

    Final Project Review

    4 min
  • 42

    Ensemble Methods Overview

    4 min
  • 43

    Feature Selection via Recursive Elimination

    3 min
  • 44

    Model Interpretability Basics

    4 min
  • 45

    Dealing with High Cardinality

    3 min
  • 46

    Handling Multi-Collinearity

    4 min
  • 47

    Introduction to Pipelines with Custom Transformers

    3 min
  • 48

    Evaluating Model Calibration

    4 min
  • 49

    Advanced Hyperparameter Search

    3 min
  • 50

    Model Monitoring in Practice

    4 min
  • View full course