Mahamudul Hasan Rubel
HomeAboutProjectsSkillsExperienceBlogCoursesPhotosContact
Mahamudul Hasan Rubel

Senior Software Engineer crafting high-performance web applications and SaaS platforms.

Navigation

  • Home
  • About
  • Projects
  • Skills
  • Experience
  • Blog
  • Courses
  • Photos
  • Contact

Get in Touch

Available for senior/lead roles and consulting.

bd.mhrubel@gmail.comHire Me

© 2026 Mahamudul Hasan Rubel. All rights reserved.

Built with using Next.js 16 & Tailwind v4

Back to Blog
Lesson 16 of the AI/ML Foundations: Core Concepts & First Models course
AI/MLJune 25, 20263 min read

Training the Baseline Linear Model: A Practical Guide

Learn how to instantiate, fit, and generate predictions with your first baseline linear model using Scikit-Learn to establish a performance benchmark.

Scikit-LearnLinear RegressionMachine LearningPythonData Scienceaimachine-learning

Previously in this course, we covered the mechanics of linear regression and the importance of training and testing data splits. Now that your data is cleaned and partitioned, it's time to build your first baseline model.

Establishing a baseline is the most critical step in any machine learning project. It provides a "floor" for performance—a simple, interpretable model against which you can measure the effectiveness of more complex techniques.

Instantiating and Fitting Your First Model

In Scikit-Learn, the workflow follows a consistent API: you instantiate an estimator object, call .fit() to train it on your data, and call .predict() to generate outputs. When you build Scikit-Learn pipelines, this process becomes even more robust because the pipeline handles the transformation steps automatically.

The Baseline Linear Model in Practice

For our running project, we will use a LinearRegression model. Since we have already preprocessed our features—handling missing data and feature selection—we can feed our training set directly into the pipeline.

PYTHON
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline

# 1. Instantiate the model
model = LinearRegression()

# 2. Create the pipeline(assuming you have a preprocessor defined)
# If you haven't defined a preprocessor yet, use a simple identity or 
# just the model itself for the absolute baseline.
baseline_pipeline = Pipeline([
    (CE9178">'regressor', model)
])

# 3. Fit the model
# X_train and y_train come from your previous train-test split step
baseline_pipeline.fit(X_train, y_train)

print("Model training complete.")

Generating Initial Predictions

Once the model is fitted, it has "learned" the coefficients (weights) that minimize the error on your training data. To see how it performs on unseen data, we pass the test set to the .predict() method.

PYTHON
# 4. Generate predictions on the test set
y_pred = baseline_pipeline.predict(X_test)

# Compare the first 5 predictions to actual values
import pandas as pd
comparison = pd.DataFrame({CE9178">'Actual': y_test, CE9178">'Predicted': y_pred})
print(comparison.head())

These initial predictions are your first real look at how well your features capture the underlying patterns in the target variable.

Hands-on Exercise: Run Your Baseline

Using the dataset you cleaned in the project dataset initialization lesson:

  1. Import LinearRegression from sklearn.linear_model.
  2. Instantiate the model and wrap it in a Pipeline.
  3. Fit the pipeline using your X_train and y_train variables.
  4. Generate predictions for X_test and store them in a variable called y_pred.
  5. Calculate the difference (residuals) between y_test and y_pred.

Common Pitfalls

  • Data Leakage: Ensure your training set does not contain information from the future or the test set. If you use a pipeline, ensure that any scaling or imputation is fitted only on the training set.
  • Dimensionality Mismatch: Always double-check the shape of your input arrays. Scikit-Learn expects X to be a 2D array (samples, features) and y to be a 1D array (samples).
  • Ignoring the Baseline: Don't be tempted to jump straight into complex models like Gradient Boosting or Neural Networks. If your complex model performs similarly to your simple linear baseline, you've likely over-engineered the solution.

Recap

In this lesson, we transitioned from theory to application by instantiating a LinearRegression model, fitting it via a pipeline, and generating predictions on the test set. This baseline acts as your primary performance metric. By establishing this foundation, you now have a clear target to beat as you experiment with feature engineering and more advanced algorithms in the coming lessons.

Up next: We will examine the gap between your training results and test results to discuss training error vs generalization error.

Previous lessonBuilding Scikit-Learn PipelinesNext lesson Training Error vs Generalization Error
Back to Blog

Similar Posts

AI/MLJune 25, 20264 min read

Handling Multi-Collinearity: Ensure Model Stability in ML

Multi-collinearity can destabilize your ML model's coefficients. Learn to calculate VIF, identify redundant features, and improve your model's reliability today.

Read more
AI/MLJune 25, 20263 min read

Creating an Inference Script: A Practical Guide for Production

Learn how to build a clean, professional inference script to generate predictions. Master model loading, data processing, and standardized output formats.

Part of the course

AI/ML Foundations: Core Concepts & First Models

beginner · Lesson 16 of 50

  1. 1

    The Machine Learning Workflow

    4 min
  2. 2

    Setting Up the Python ML Environment

    4 min
  3. 3

    Introduction to NumPy for Data Handling

    4 min
Read more
AI/MLJune 25, 20264 min read

Hyperparameter Tuning Basics: Controlling Model Behavior

Master the difference between learned parameters and hyperparameters. Learn to identify tunable settings to optimize your machine learning models effectively.

Read more
4

Loading and Inspecting Datasets with Pandas

3 min
  • 5

    Exploratory Data Analysis Fundamentals

    3 min
  • 6

    Handling Missing and Inconsistent Data

    3 min
  • 7

    Feature Selection and Basic Filtering

    3 min
  • 8

    Project Dataset Initialization

    3 min
  • 9

    Mechanics of Linear Regression

    4 min
  • 10

    Mechanics of Classification

    4 min
  • 11

    Loss Functions and Model Objectives

    4 min
  • 12

    Training and Testing Data Splits

    3 min
  • 13

    Data Scaling Techniques

    4 min
  • 14

    Encoding Categorical Variables

    3 min
  • 15

    Building Scikit-Learn Pipelines

    4 min
  • 16

    Training the Baseline Linear Model

    3 min
  • 17

    Training Error vs Generalization Error

    4 min
  • 18

    Overfitting and Underfitting

    4 min
  • 19

    Regression Evaluation Metrics

    4 min
  • 20

    The Confusion Matrix

    3 min
  • 21

    Error Analysis Plots

    4 min
  • 22

    Introduction to Cross-Validation

    4 min
  • 23

    Diagnosing Model Weaknesses

    3 min
  • 24

    Feature Engineering Strategies

    4 min
  • 25

    Handling Outliers

    3 min
  • 26

    The Bias-Variance Tradeoff

    3 min
  • 27

    Hyperparameter Tuning Basics

    4 min
  • 28

    Implementing Grid Search

    3 min
  • 29

    Refining the Project Model

    3 min
  • 30

    Evaluating Feature Importance

    3 min
  • 31

    Advanced Feature Transformation

    3 min
  • 32

    Regularization Techniques

    3 min
  • 33

    Comparing Different Algorithms

    3 min
  • 34

    Managing Model Complexity

    4 min
  • 35

    Understanding Data Drift

    4 min
  • 36

    Version Control for ML Experiments

    3 min
  • 37

    Exporting Trained Models

    3 min
  • 38

    Creating an Inference Script

    3 min
  • 39

    Building a Simple Web Interface

    3 min
  • 40

    Documenting ML Projects

    4 min
  • 41

    Final Project Review

    4 min
  • 42

    Ensemble Methods Overview

    4 min
  • 43

    Feature Selection via Recursive Elimination

    3 min
  • 44

    Model Interpretability Basics

    4 min
  • 45

    Dealing with High Cardinality

    3 min
  • 46

    Handling Multi-Collinearity

    4 min
  • 47

    Introduction to Pipelines with Custom Transformers

    3 min
  • 48

    Evaluating Model Calibration

    4 min
  • 49

    Advanced Hyperparameter Search

    3 min
  • 50

    Model Monitoring in Practice

    4 min
  • View full course