Run this notebook: Open in Colab Open in Kaggle

Sorting, Searching, and Counting¶

These operations help you organize and query array data. Sorting arranges elements in order (np.sort(), np.argsort(), np.lexsort()). Searching locates elements by condition or value (np.argmax(), np.argmin(), np.where(), np.searchsorted()). Counting tallies elements that meet criteria (np.count_nonzero()). Together, they support ranking, finding top-k elements, binary search on sorted data, partitioning arrays, and computing order statistics – all common tasks in data analysis and ML model evaluation.

import numpy as np

np.__version__

author = 'kyubyong. longinglove@nate.com'

Sorting¶

np.sort() returns a sorted copy, while np.argsort() returns the indices that would sort the array (useful for sorting one array based on another). np.lexsort() performs an indirect stable sort using multiple keys – like sorting a table by last name, then by first name. np.partition() partially sorts an array so that the k-th element is in its final sorted position, which is faster than full sorting when you only need the top-k or bottom-k values.

Q1. Sort x along the second axis.

x = np.array([[1,4],[3,1]])

Q2. Sort pairs of surnames and first names and return their indices. (first by surname, then by name).

surnames =    ('Hertz',    'Galilei', 'Hertz')
first_names = ('Heinrich', 'Galileo', 'Gustav')

Q3. Get the indices that would sort x along the second axis.

x = np.array([[1,4],[3,1]])

Q4. Create an array such that its fifth element would be the same as the element of sorted x, and it divide other elements by their value.

x = np.random.permutation(10)
print "x =", x

Q5. Create the indices of an array such that its third element would be the same as the element of sorted x, and it divide other elements by their value.

x = np.random.permutation(10)
print "x =", x

Searching¶

np.argmax() and np.argmin() find the indices of extreme values. np.nanargmax() and np.nanargmin() do the same while ignoring NaN values. np.where() returns indices where a condition is true. np.nonzero() returns indices of non-zero elements. np.searchsorted() finds insertion points to maintain sorted order, which is useful for binning continuous values into discrete categories. np.extract() returns elements matching a condition.

Q6. Get the maximum and minimum values and their indices of x along the second axis.

x = np.random.permutation(10).reshape(2, 5)
print "x =", x

Q7. Get the maximum and minimum values and their indices of x along the second axis, ignoring NaNs.

x = np.array([[np.nan, 4], [3, 2]])

Q8. Get the values and indices of the elements that are bigger than 2 in x.

x = np.array([[1, 2, 3], [1, 3, 5]])

Q9. Get the indices of the elements that are bigger than 2 in the flattend x.

x = np.array([[1, 2, 3], [1, 3, 5]])

Q10. Check the elements of x and return 0 if it is less than 0, otherwise the element itself.

x = np.arange(-5, 4).reshape(3, 3)

Q11. Get the indices where elements of y should be inserted to x to maintain order.

x = [1, 3, 5, 7, 9]
y = [0, 4, 2, 6]

Counting¶

np.count_nonzero() counts elements that are not zero (or equivalently, elements that evaluate to True in a boolean array). This is useful for counting how many elements meet a condition (since True is non-zero): for example, np.count_nonzero(arr > threshold) tells you how many values exceed a threshold. In sparse data, counting non-zeros reveals the density of the data – important for choosing between dense and sparse representations.

Q12. Get the number of nonzero elements in x.

x = [[0,1,7,0,0],[3,0,0,2,19]]