More array operationsΒΆ

Beyond basic creation and arithmetic, NumPy provides operations for expanding dimensions, conditional filtering, slicing, stacking, and splitting arrays. These operations are the building blocks of data preprocessing pipelines and are used extensively in deep learning frameworks like TensorFlow and PyTorch, which rely on NumPy-compatible array semantics.

import numpy as np

Expanding arraysΒΆ

The np.newaxis indexer adds a new dimension to an array, which is essential for broadcasting and for meeting the shape requirements of mathematical operations. For example, converting a 1D array of shape (4,) to (1, 4) allows it to participate in matrix operations that expect two-dimensional inputs.

# create a single dimension array
array = np.ones((4,))
print(array)
print(array.shape)

# expand one row
expanded = array[np.newaxis, :]
expanded.shape

Conditional new arraysΒΆ

NumPy supports boolean indexing, where a comparison operation produces a boolean mask of the same shape as the array, and passing that mask back into the array selects only the matching elements. This is the NumPy equivalent of Pandas boolean filtering and is fundamental to operations like thresholding, outlier removal, and feature masking.

array = np.array([[3,12,11],[45,22,11],[56,15,22]])
# represent all matching conditions
print(array < 20)
# produce all matching values
array[array < 20]

SlicingΒΆ

Similar to Python lists, NumPy arrays support slice notation with start:stop:step, but with additional capabilities for multi-dimensional arrays. Slicing returns a view (not a copy) of the original array, which is memory-efficient but means modifications to a slice will affect the original data. For independent copies, use .copy() after slicing.

a = np.arange(6)
a
# give a range
a[1:4]

StackingΒΆ

np.hstack() joins arrays horizontally (along columns) and np.vstack() joins them vertically (along rows). The arrays must have compatible dimensions along the axis being joined. Stacking is commonly used to combine feature matrices, assemble batch data for neural networks, or merge results from parallel computations.

# horizontally
# must have same number of dimensions!
first = np.full((2, 12), 2)
print(first)
second = np.full((2, 7), 2)
print(second)
np.hstack((first, second))
# stacking vertically
first = np.full((3,3), 1)
print(first)
second = np.full((3,3), 2)
print(second)
np.vstack((first, second))

Splitting arraysΒΆ

np.hsplit() divides an array into equal parts along the horizontal axis. The number of splits must evenly divide the array length, otherwise NumPy raises an error. The results can be unpacked into separate variables, which is useful for creating train/validation/test splits or partitioning data for parallel processing.

array = np.arange(1, 11)
array
# split into two equal parts
# warning it needs to allow for an equal division!
np.hsplit(array, 2)
# this allows for "unpacking"
first, second = np.hsplit(array, 2)
print(first)
print(second)