Version Control for ML Experiments: Git and MLflow

Stop losing track of your best models. Learn how to combine Git for code and MLflow for experiment tracking to ensure your ML projects are reproducible.

GitMLflowreproducibilityexperiment trackingmachine learningversion controlaimachine-learningpython

Previously in this course, we discussed managing model complexity by balancing bias, variance, and regularization. While your model might be optimized, you’ll quickly find that "it worked on my machine" is a dangerous trap in machine learning. Today, we add professional-grade version control to our workflow, covering how to track code changes, hyperparameter configurations, and the resulting performance metrics.

The Reproducibility Crisis in ML

In standard software engineering, version control is about code. In machine learning, code is only one-third of the equation. A model is the product of:

Code: The training script and pipeline logic.
Data: The specific version of the dataset used.
Parameters: The hyperparameters (e.g., learning rate, tree depth) used during training.

If you change your code but forget which hyperparameters you used, you cannot reproduce your results. This leads to "notebook drift," where you have ten versions of a model and no idea which one performed best or why.

Using Git for Code and Configuration

Git is your baseline. You should never run an experiment on "dirty" code. Before starting a training run, commit your changes.


Bash
# Always track your changes before running a major experiment
git add src/model_training.py
git commit -m "Add polynomial features and adjust regularization alpha"

However, Git is terrible at tracking large datasets and binary model files. For these, use Git LFS (Large File Storage) or, better yet, external data versioning tools like DVC. For our current project, we will use MLflow to bridge the gap between our Git-tracked code and our experiment results.

Tracking Experiments with MLflow

MLflow is the industry standard for experiment tracking. It allows you to log parameters and metrics directly from your Python script, creating a searchable record of every training run.

First, install the library: pip install mlflow

Here is a concrete example of how to wrap your training script to track your experiments:


PYTHON
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestRegressor

# Start an MLflow run
with mlflow.start_run():
    # 1. Define hyperparameters
    params = {"n_estimators": 100, "max_depth": 5}
    
    # 2. Log parameters to MLflow
    mlflow.log_params(params)
    
    # 3. Train your model
    model = RandomForestRegressor(**params)
    model.fit(X_train, y_train)
    
    # 4. Log metrics
    score = model.score(X_test, y_test)
    mlflow.log_metric("r2_score", score)
    
    # 5. Log the model artifact itself
    mlflow.sklearn.log_model(model, "random_forest_model")

When you run this script, MLflow creates a local mlruns directory. You can visualize your results by running mlflow ui in your terminal and navigating to http://localhost:5000. You’ll see a clean table comparing every run you've ever executed.

Hands-on Exercise: Audit Your Current Project

Initialize a new Git repository in your project folder if you haven't already.
Install mlflow and modify your training script from the previous lesson to log at least three hyperparameters and your final RMSE metric.
Run the script twice with different parameter settings.
Run mlflow ui and verify that both runs appear in your dashboard.

Common Pitfalls

Hardcoding Paths: Never hardcode absolute paths (e.g., C:/Users/Name/Data/file.csv). Use relative paths or environment variables so your code runs on any machine.
Ignoring the Data Version: MLflow tracks the model, but if your input CSV changes, the model result changes. Always keep a versioned copy of your training data (e.g., data/train_v1.csv).
Over-logging: Don't log every single iteration of a loop. Log the final hyperparameters and the final evaluation metrics to keep your dashboard readable.

Recap

We’ve moved from manual tracking to a formal system. By combining Git for code versioning and MLflow for experiment tracking, you ensure that every model you build can be audited, compared, and reproduced. This is the difference between a "scripting hobbyist" and an ML engineer.

Up next: We will learn how to save your trained models to disk using joblib so you can move them from your notebook into a production-ready inference script.

Back to Blog

Version Control for ML Experiments: Git and MLflow

The Reproducibility Crisis in ML

Using Git for Code and Configuration

Tracking Experiments with MLflow

Hands-on Exercise: Audit Your Current Project

Common Pitfalls

Recap

Similar Posts

Advanced Hyperparameter Search: Beyond Grid Search

Evaluating Model Calibration: Accuracy Beyond Just Predictions

Dealing with High Cardinality: Advanced Categorical Encoding