Run this notebook: Open in Colab Open in Kaggle

NumPy Indexing and Selection¶

In this lecture we will discuss how to select elements or groups of elements from an array.

# import numpy as np
import numpy as np

#Creating sample array from 0 to 10
arr = np.arange(0,11)

#Show
arr

Bracket Indexing and Selection¶

The simplest way to pick one or some elements of an array looks very similar to python lists:

#Get a value at an index
arr[8]

#Get values in a range
arr[1:5]

#Get values in a range
arr[0:5]

Slice Notation¶

NumPy inherits Python’s slice notation (start:stop:step). Omitting start defaults to the beginning of the array; omitting stop defaults to the end. Slicing returns elements from the start index up to (but not including) the stop index, exactly like Python lists. This consistency across Python and NumPy means the slicing skills you build here transfer directly to Pandas Series and DataFrames.

arr[5:]

Broadcasting¶

Numpy arrays differ from a normal Python list because of their ability to broadcast:

#Setting a value with index range (Broadcasting)
arr[0:5]=100

#Show
arr

# Reset array, we'll see why I had to reset in  a moment
arr = np.arange(0,11)

#Show
arr

#Important notes on Slices
slice_of_arr = arr[0:6]

#Show slice
slice_of_arr

#Change Slice
slice_of_arr[:]=99

#Show Slice again
slice_of_arr

Views, Not Copies¶

A critical difference between NumPy slices and Python list slices: NumPy slices are views of the original array, not independent copies. Changes to a slice propagate back to the original array. This design choice avoids unnecessary memory allocation when working with large datasets, but it means you must be deliberate about when to modify slices. If you see unexpected mutations in your data, a missing .copy() call is often the culprit.

arr

Why Views Matter for Memory¶

NumPy returns views instead of copies to avoid duplicating large arrays in memory. A 1 GB dataset would become 2 GB every time you sliced it if copies were the default. When you genuinely need an independent copy – for example, to store a preprocessed version alongside the original – use arr.copy() explicitly. This view-based memory model is a deliberate performance optimization that data scientists must understand to avoid both bugs and memory bloat.

#To get a copy, need to be explicit
arr_copy = arr.copy()

arr_copy

Indexing a 2D array (matrices)¶

The general format is arr_2d[row][col] or arr_2d[row,col]. I recommend usually using the comma notation for clarity.

arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))

#Show
arr_2d

Double Bracket vs Comma Notation¶

There are two ways to index a 2D array: double bracket arr_2d[row][col] and comma notation arr_2d[row, col]. Both return the same result, but comma notation is preferred because it is more readable, more efficient (single indexing operation vs. two), and required for advanced slicing like arr_2d[:2, 1:]. The comma notation reads naturally as “row, column” and mirrors mathematical matrix notation.

#Indexing row
arr_2d[1]

# Format is arr_2d[row][col] or arr_2d[row,col]

# Getting individual element value
arr_2d[1][0]

Comma Notation (Recommended)¶

The comma notation arr[row, col] is the idiomatic way to access elements in multi-dimensional NumPy arrays. It performs a single indexing operation and supports all slicing features, including selecting sub-matrices with arr[:2, 1:] (first 2 rows, columns from index 1 onward). This syntax is what you will encounter in virtually all NumPy, Pandas, and ML library documentation.

# Getting individual element value
arr_2d[1,0]

Slicing Sub-matrices¶

You can extract rectangular sub-sections of a 2D array by combining row and column slices: arr_2d[:2, 1:] selects the first 2 rows and columns from index 1 to the end. This is exactly how you extract feature subsets from a data matrix or crop regions from image arrays in computer vision tasks.

# 2D array slicing

#Shape (2,2) from top right corner
arr_2d[:2,1:]

#Shape bottom row
arr_2d[2]

#Shape bottom row
arr_2d[2,:]

Fancy Indexing¶

Fancy indexing allows you to select entire rows or columns out of order,to show this, let’s quickly build out a numpy array:

#Set up matrix
arr2d = np.zeros((10,10))

#Length of array
arr_length = arr2d.shape[1]

#Set up array

for i in range(arr_length):
    arr2d[i] = i
    
arr2d

Fancy Indexing in Action¶

By passing a list of row indices like arr2d[[2, 4, 6, 8]], you retrieve those specific rows in exactly the order specified. This lets you shuffle rows, sample specific data points, or reorder a dataset without loops. Fancy indexing always returns a copy (not a view), which differs from basic slicing behavior.

arr2d[[2,4,6,8]]

#Allows in any order
arr2d[[6,4,2,7]]

More Indexing Help¶

Indexing a 2d matrix can be a bit confusing at first, especially when you start to add in step size. Try google image searching NumPy indexing to fins useful images, like this one:

http://memory.osu.edu/classes/python/_images/numpy_indexing.png

Selection¶

Let’s briefly go over how to use brackets for selection based off of comparison operators.

arr = np.arange(1,11)
arr

arr > 4

bool_arr = arr>4

bool_arr

Boolean Selection in Practice¶

The most common pattern combines comparison and selection in one step: arr[arr > 4] returns only elements greater than 4. Under the hood, arr > 4 creates a boolean mask, then the bracket indexing filters using that mask. You can assign to boolean-selected elements too – arr[arr < 0] = 0 clamps negative values to zero. This pattern is the NumPy equivalent of a SQL WHERE clause and is fundamental to data cleaning, outlier removal, and feature engineering.

arr[bool_arr]

Usually you would do this all in one step like this:

arr[arr>2]

x = 2
arr[arr>x]