Mahamudul Hasan Rubel
HomeAboutProjectsSkillsExperienceBlogCoursesPhotosContact
Mahamudul Hasan Rubel

Senior Software Engineer crafting high-performance web applications and SaaS platforms.

Navigation

  • Home
  • About
  • Projects
  • Skills
  • Experience
  • Blog
  • Courses
  • Photos
  • Contact

Get in Touch

Available for senior/lead roles and consulting.

bd.mhrubel@gmail.comHire Me

© 2026 Mahamudul Hasan Rubel. All rights reserved.

Built with using Next.js 16 & Tailwind v4

Back to Blog
Lesson 22 of the AI/ML Foundations: Core Concepts & First Models course
AI/MLJune 25, 20264 min read

Introduction to Cross-Validation: Ensuring Model Stability

Stop relying on a single train-test split. Learn how K-Fold cross-validation provides a stable, reliable evaluation of your machine learning models.

Machine Learningscikit-learncross-validationmodel evaluationdata scienceaimachine-learningpython

Previously in this course, we discussed the importance of training and testing data splits to estimate how well a model performs on unseen data. However, a single split is often a "lucky" or "unlucky" roll of the dice—your evaluation score depends heavily on which specific rows ended up in your test set.

In this lesson, we move beyond the single split to cross-validation, a technique that systematically rotates your data to give you a more honest, stable assessment of your model's predictive power.

Why We Need Cross-Validation

When you perform a standard train-test split, you might find that your model performs exceptionally well on one test set but poorly on another. This sensitivity to data partitioning is a sign of instability. If your dataset is relatively small, or if there is underlying noise in your data, a single split doesn't capture the full picture of how your model will perform in production.

Cross-validation solves this by partitioning the data into multiple subsets, or "folds." The model is trained and evaluated multiple times, ensuring that every data point gets a turn in the test set. This produces a distribution of scores rather than a single point estimate, allowing you to gauge the stability of your model.

Understanding K-Fold Cross-Validation

In K-Fold cross-validation, the process follows these steps:

  1. Split: Divide your entire dataset into K equal-sized folds.
  2. Iterate: For each fold (from 1 to K):
    • Use the current fold as the test set.
    • Use the remaining K-1 folds as the training set.
    • Train the model on the training set and calculate the performance score on the test set.
  3. Aggregate: Calculate the mean and standard deviation of the K scores.

A common choice for K is 5 or 10. With 5-fold cross-validation, your model is evaluated five times, and you get five different accuracy or error scores.

Implementing with cross_val_score

Scikit-learn makes this straightforward with the cross_val_score function. It handles the splitting, training, and scoring internally, returning an array of scores.

PYTHON
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
import numpy as np

# Assuming X and y are already preprocessed
model = LinearRegression()

# Perform 5-fold cross-validation
# We use CE9178">'neg_mean_squared_error' as an example metric
scores = cross_val_score(model, X, y, cv=5, scoring=CE9178">'neg_mean_squared_error')

# cross_val_score returns negative values for error metrics 
# to ensure higher is always better for scikit-learn
mse_scores = -scores

print(f"Scores for each fold: {mse_scores}")
print(f"Mean MSE: {mse_scores.mean():.4f}")
print(f"Standard Deviation: {mse_scores.std():.4f}")

Interpreting the Results

  • Mean: This is your best estimate of the model's actual performance.
  • Standard Deviation: This is the crucial metric for stability. A high standard deviation means your model's performance is highly sensitive to the data it sees—a red flag for overfitting or data quality issues.

Hands-on Exercise

Using the project dataset you initialized in project dataset initialization, apply 5-fold cross-validation to your baseline model.

  1. Import cross_val_score from sklearn.model_selection.
  2. Run the evaluation on your features and target variable.
  3. Calculate the mean score and the standard deviation.
  4. Reflect: If the standard deviation is large (e.g., more than 20% of the mean), what does that suggest about your model's reliability?

Common Pitfalls

  • Data Leakage: Always perform cross-validation after any global preprocessing. If you scale your data using the entire dataset before folding, you have leaked information from the test folds into your training folds. Use a Scikit-Learn Pipeline to ensure that scaling happens inside the cross-validation loop for each fold.
  • Small Datasets: If your dataset is very small, use "Leave-One-Out" cross-validation (setting K equal to the number of samples), but be aware this is computationally expensive.
  • Ignoring the Standard Deviation: Beginners often look only at the mean. Never ignore the variance between folds; a model with a slightly worse mean but a much lower standard deviation is often better for production because it is more predictable.

Recap

Cross-validation is your primary tool for ensuring model stability. By using K-Fold techniques, you move away from the volatility of a single train-test split and gain a statistical understanding of your model's performance. Always pair this with Pipeline objects to avoid data leakage and ensure your evaluation is truly representative of how your model will handle new data in the real world.

Up next: Diagnosing Model Weaknesses by analyzing where your model fails.

Previous lessonError Analysis PlotsNext lesson Diagnosing Model Weaknesses
Back to Blog

Similar Posts

AI/MLJune 25, 20264 min read

Mastering Regression Evaluation Metrics: RMSE, MAE, and R-squared

Learn to measure model accuracy with essential regression metrics. We break down RMSE, MAE, and R-squared so you can evaluate your predictions like a pro.

Read more
AI/MLJune 25, 20264 min read

Training Error vs Generalization Error: A Practical Guide

Learn why high training performance often masks poor real-world results. Discover how to compare training and testing error to master model generalization.

Part of the course

AI/ML Foundations: Core Concepts & First Models

beginner · Lesson 22 of 50

  1. 1

    The Machine Learning Workflow

    4 min
  2. 2

    Setting Up the Python ML Environment

    4 min
  3. 3

    Introduction to NumPy for Data Handling

    4 min
Read more
AI/MLJune 25, 20264 min read

Evaluating Model Calibration: Accuracy Beyond Just Predictions

Learn how to evaluate model calibration using calibration curves and the Brier score. Ensure your predicted probabilities are accurate representations of reality.

Read more
4

Loading and Inspecting Datasets with Pandas

3 min
  • 5

    Exploratory Data Analysis Fundamentals

    3 min
  • 6

    Handling Missing and Inconsistent Data

    3 min
  • 7

    Feature Selection and Basic Filtering

    3 min
  • 8

    Project Dataset Initialization

    3 min
  • 9

    Mechanics of Linear Regression

    4 min
  • 10

    Mechanics of Classification

    4 min
  • 11

    Loss Functions and Model Objectives

    4 min
  • 12

    Training and Testing Data Splits

    3 min
  • 13

    Data Scaling Techniques

    4 min
  • 14

    Encoding Categorical Variables

    3 min
  • 15

    Building Scikit-Learn Pipelines

    4 min
  • 16

    Training the Baseline Linear Model

    3 min
  • 17

    Training Error vs Generalization Error

    4 min
  • 18

    Overfitting and Underfitting

    4 min
  • 19

    Regression Evaluation Metrics

    4 min
  • 20

    The Confusion Matrix

    3 min
  • 21

    Error Analysis Plots

    4 min
  • 22

    Introduction to Cross-Validation

    4 min
  • 23

    Diagnosing Model Weaknesses

    3 min
  • 24

    Feature Engineering Strategies

    4 min
  • 25

    Handling Outliers

    3 min
  • 26

    The Bias-Variance Tradeoff

    3 min
  • 27

    Hyperparameter Tuning Basics

    4 min
  • 28

    Implementing Grid Search

    3 min
  • 29

    Refining the Project Model

    3 min
  • 30

    Evaluating Feature Importance

    3 min
  • 31

    Advanced Feature Transformation

    3 min
  • 32

    Regularization Techniques

    3 min
  • 33

    Comparing Different Algorithms

    3 min
  • 34

    Managing Model Complexity

    4 min
  • 35

    Understanding Data Drift

    4 min
  • 36

    Version Control for ML Experiments

    3 min
  • 37

    Exporting Trained Models

    3 min
  • 38

    Creating an Inference Script

    3 min
  • 39

    Building a Simple Web Interface

    3 min
  • 40

    Documenting ML Projects

    4 min
  • 41

    Final Project Review

    4 min
  • 42

    Ensemble Methods Overview

    4 min
  • 43

    Feature Selection via Recursive Elimination

    3 min
  • 44

    Model Interpretability Basics

    4 min
  • 45

    Dealing with High Cardinality

    3 min
  • 46

    Handling Multi-Collinearity

    4 min
  • 47

    Introduction to Pipelines with Custom Transformers

    3 min
  • 48

    Evaluating Model Calibration

    4 min
  • 49

    Advanced Hyperparameter Search

    3 min
  • 50

    Model Monitoring in Practice

    4 min
  • View full course