Mahamudul Hasan Rubel
HomeAboutProjectsSkillsExperienceBlogCoursesPhotosContact
Mahamudul Hasan Rubel

Senior Software Engineer crafting high-performance web applications and SaaS platforms.

Navigation

  • Home
  • About
  • Projects
  • Skills
  • Experience
  • Blog
  • Courses
  • Photos
  • Contact

Get in Touch

Available for senior/lead roles and consulting.

bd.mhrubel@gmail.comHire Me

© 2026 Mahamudul Hasan Rubel. All rights reserved.

Built with using Next.js 16 & Tailwind v4

Back to Blog
Lesson 3 of the AI/ML Foundations: Core Concepts & First Models course
AI/MLJune 24, 20264 min read

Introduction to NumPy for Data Handling: Arrays and Vectorization

Master NumPy arrays to handle numerical data efficiently. Learn how to perform fast element-wise operations and indexing for your ML projects.

NumPyData SciencePythonMachine LearningArraysVectorizationaimachine-learning

Previously in this course, we covered the Machine Learning Workflow and ensured your Python ML Environment was ready for heavy lifting. Now, we move from theory to the engine room: NumPy.

In machine learning, you don't process data one record at a time using Python lists. That is slow, memory-intensive, and impractical. Instead, we use NumPy, the library that provides high-performance, multidimensional arrays. When you understand NumPy, you understand how data actually flows through your models.

Why NumPy? First Principles of Vectorization

Python lists are flexible—they can hold integers, strings, and objects in the same container. This flexibility comes at a cost: every time you access an element, Python has to check its type.

NumPy arrays are different. They are homogeneous, meaning every element must be the same type (usually a 64-bit float or integer). Because the type is fixed, NumPy stores these elements in contiguous memory blocks. This allows for vectorization: the ability to perform operations on entire arrays at once without writing explicit for loops.

When you multiply an array by 2, NumPy pushes that operation down to highly optimized C code. This is the difference between a model that finishes training in seconds versus one that hangs for hours.

Creating and Manipulating Arrays

Close-up of a hand arranging purple tokens in a pattern on a vibrant yellow background.

To get started, you'll need to import the library. The convention is import numpy as np.

1. Creating Arrays

You can create arrays from standard Python lists or use built-in functions for common patterns.

PYTHON
import numpy as np

# From a list
data = np.array([1, 2, 3, 4])

# A 2x3 matrix of zeros(common for initializing weights)
zeros = np.zeros((2, 3))

# An array of evenly spaced numbers
range_arr = np.arange(0, 10, 2)  # Output: [0, 2, 4, 6, 8]

2. Element-wise Arithmetic

Vectorization means you treat the array as a single entity. If you have a dataset of feature values, you can normalize them or scale them in one line.

PYTHON
prices = np.array([100, 200, 300])

# Add 50 to every element simultaneously
taxed_prices = prices + 50 

# Multiply by a scalar
discounted = prices * 0.9

# Element-wise multiplication of two arrays of the same shape
base = np.array([10, 20, 30])
multiplier = np.array([1, 2, 3])
result = base * multiplier  # Output: [10, 40, 90]

3. Indexing and Slicing

Indexing in NumPy follows the [row, column] syntax. For a 2D array, arr[0, :] selects the entire first row, while arr[:, 1] selects the second column.

PYTHON
matrix = np.array([[1, 2, 3], 
                   [4, 5, 6]])

print(matrix[0, 1])    # Output: 2 (row 0, col 1)
print(matrix[1, :])    # Output: [4, 5, 6] (all columns in row 1)
print(matrix[:, 0:2])  # Output: [[1, 2], [4, 5]] (first two columns)

Hands-on Exercise: Preparing Feature Data

Imagine you have a small dataset representing the square footage and number of rooms for three houses.

  1. Create a 3x2 NumPy array called house_data where the first column is square footage [1000, 1500, 2000] and the second column is the number of rooms [2, 3, 4].
  2. Multiply the square footage column by 0.0929 to convert it to square meters (do this using slicing).
  3. Print the resulting array.

Solution:

PYTHON
house_data = np.array([[1000, 2], [1500, 3], [2000, 4]])
house_data[:, 0] = house_data[:, 0] * 0.0929
print(house_data)

Common Pitfalls

  • Shape Mismatches: If you try to add a (3,) array to a (2,) array, NumPy will throw a ValueError. Always check array.shape if you're unsure.
  • Broadcasting Confusion: NumPy can "broadcast" a smaller array across a larger one (e.g., adding a scalar to a matrix), but this can lead to logic errors if you don't understand the dimensions. When in doubt, check your dimensions.
  • Modifying Views: Slicing an array does not create a copy; it creates a view. If you modify a slice, you modify the original array. If you need a separate copy, use slice.copy().

Recap

NumPy is the backbone of efficient numerical computation in Python. By using arrays instead of lists, you gain access to vectorization, which makes your code faster and more concise. We've mastered creating arrays, performing element-wise arithmetic, and using slicing to isolate specific data points.

Up next: Loading and Inspecting Datasets with Pandas.

Previous lessonSetting Up the Python ML EnvironmentNext lesson Loading and Inspecting Datasets with Pandas
Back to Blog

Similar Posts

AI/MLJune 24, 20264 min read

Setting Up the Python ML Environment: A Practical Guide

Learn how to configure your Python environment for machine learning. We cover Anaconda/venv installation, library verification, and launching Jupyter Notebooks.

Read more
AI/MLJune 24, 20263 min read

Loading and Inspecting Datasets with Pandas: A Practical Guide

Master Pandas by learning to load CSV files into DataFrames and perform essential EDA. Build the technical foundation needed for real-world ML projects.

Part of the course

AI/ML Foundations: Core Concepts & First Models

beginner · Lesson 3 of 50

  1. 1

    The Machine Learning Workflow

    4 min
  2. 2

    Setting Up the Python ML Environment

    4 min
  3. 3

    Introduction to NumPy for Data Handling

    4 min
Read more
AI/MLJune 24, 20264 min read

The Machine Learning Workflow: From Data to Deployment

Master the ML lifecycle. Learn how features, labels, and supervised learning form the backbone of every production-grade machine learning project.

Read more
  • 4

    Loading and Inspecting Datasets with Pandas

    3 min
  • 5

    Exploratory Data Analysis Fundamentals

    Coming soon
  • 6

    Handling Missing and Inconsistent Data

    Coming soon
  • 7

    Feature Selection and Basic Filtering

    Coming soon
  • 8

    Project Dataset Initialization

    Coming soon
  • 9

    Mechanics of Linear Regression

    Coming soon
  • 10

    Mechanics of Classification

    Coming soon
  • 11

    Loss Functions and Model Objectives

    Coming soon
  • 12

    Training and Testing Data Splits

    Coming soon
  • 13

    Data Scaling Techniques

    Coming soon
  • 14

    Encoding Categorical Variables

    Coming soon
  • 15

    Building Scikit-Learn Pipelines

    Coming soon
  • 16

    Training the Baseline Linear Model

    Coming soon
  • 17

    Training Error vs Generalization Error

    Coming soon
  • 18

    Overfitting and Underfitting

    Coming soon
  • 19

    Regression Evaluation Metrics

    Coming soon
  • 20

    The Confusion Matrix

    Coming soon
  • 21

    Error Analysis Plots

    Coming soon
  • 22

    Introduction to Cross-Validation

    Coming soon
  • 23

    Diagnosing Model Weaknesses

    Coming soon
  • 24

    Feature Engineering Strategies

    Coming soon
  • 25

    Handling Outliers

    Coming soon
  • 26

    The Bias-Variance Tradeoff

    Coming soon
  • 27

    Hyperparameter Tuning Basics

    Coming soon
  • 28

    Implementing Grid Search

    Coming soon
  • 29

    Refining the Project Model

    Coming soon
  • 30

    Evaluating Feature Importance

    Coming soon
  • 31

    Advanced Feature Transformation

    Coming soon
  • 32

    Regularization Techniques

    Coming soon
  • 33

    Comparing Different Algorithms

    Coming soon
  • 34

    Managing Model Complexity

    Coming soon
  • 35

    Understanding Data Drift

    Coming soon
  • 36

    Version Control for ML Experiments

    Coming soon
  • 37

    Exporting Trained Models

    Coming soon
  • 38

    Creating an Inference Script

    Coming soon
  • 39

    Building a Simple Web Interface

    Coming soon
  • 40

    Documenting ML Projects

    Coming soon
  • 41

    Final Project Review

    Coming soon
  • 42

    Ensemble Methods Overview

    Coming soon
  • 43

    Feature Selection via Recursive Elimination

    Coming soon
  • 44

    Model Interpretability Basics

    Coming soon
  • 45

    Dealing with High Cardinality

    Coming soon
  • 46

    Handling Multi-Collinearity

    Coming soon
  • 47

    Introduction to Pipelines with Custom Transformers

    Coming soon
  • 48

    Evaluating Model Calibration

    Coming soon
  • 49

    Advanced Hyperparameter Search

    Coming soon
  • 50

    Model Monitoring in Practice

    Coming soon
  • View full course