Stop losing track of your best models. Learn how to combine Git for code and MLflow for experiment tracking to ensure your ML projects are reproducible.
Previously in this course, we discussed managing model complexity by balancing bias, variance, and regularization. While your model might be optimized, you’ll quickly find that "it worked on my machine" is a dangerous trap in machine learning. Today, we add professional-grade version control to our workflow, covering how to track code changes, hyperparameter configurations, and the resulting performance metrics.
In standard software engineering, version control is about code. In machine learning, code is only one-third of the equation. A model is the product of:
If you change your code but forget which hyperparameters you used, you cannot reproduce your results. This leads to "notebook drift," where you have ten versions of a model and no idea which one performed best or why.
Git is your baseline. You should never run an experiment on "dirty" code. Before starting a training run, commit your changes.
Bash# Always track your changes before running a major experiment git add src/model_training.py git commit -m "Add polynomial features and adjust regularization alpha"
However, Git is terrible at tracking large datasets and binary model files. For these, use Git LFS (Large File Storage) or, better yet, external data versioning tools like DVC. For our current project, we will use MLflow to bridge the gap between our Git-tracked code and our experiment results.
MLflow is the industry standard for experiment tracking. It allows you to log parameters and metrics directly from your Python script, creating a searchable record of every training run.
First, install the library:
pip install mlflow
Here is a concrete example of how to wrap your training script to track your experiments:
PYTHONimport mlflow import mlflow.sklearn from sklearn.ensemble import RandomForestRegressor # Start an MLflow run with mlflow.start_run(): # 1. Define hyperparameters params = {"n_estimators": 100, "max_depth": 5} # 2. Log parameters to MLflow mlflow.log_params(params) # 3. Train your model model = RandomForestRegressor(**params) model.fit(X_train, y_train) # 4. Log metrics score = model.score(X_test, y_test) mlflow.log_metric("r2_score", score) # 5. Log the model artifact itself mlflow.sklearn.log_model(model, "random_forest_model")
When you run this script, MLflow creates a local mlruns directory. You can visualize your results by running mlflow ui in your terminal and navigating to http://localhost:5000. You’ll see a clean table comparing every run you've ever executed.
mlflow and modify your training script from the previous lesson to log at least three hyperparameters and your final RMSE metric.mlflow ui and verify that both runs appear in your dashboard.C:/Users/Name/Data/file.csv). Use relative paths or environment variables so your code runs on any machine.data/train_v1.csv).We’ve moved from manual tracking to a formal system. By combining Git for code versioning and MLflow for experiment tracking, you ensure that every model you build can be audited, compared, and reproduced. This is the difference between a "scripting hobbyist" and an ML engineer.
Up next: We will learn how to save your trained models to disk using joblib so you can move them from your notebook into a production-ready inference script.
Master advanced hyperparameter tuning with RandomizedSearchCV and Bayesian optimization. Learn to scale your experiments efficiently for better ML models.
Read moreLearn how to evaluate model calibration using calibration curves and the Brier score. Ensure your predicted probabilities are accurate representations of reality.
Version Control for ML Experiments