Sorting, Searching, and Counting – SolutionsΒΆ

Solutions demonstrating NumPy’s functions for reordering, locating, and tallying array elements. Sorting with np.sort(), np.argsort(), np.lexsort(), and np.partition() underpins everything from ranking predictions to implementing efficient k-nearest-neighbor lookups. Searching with np.argmax(), np.where(), np.nonzero(), and np.searchsorted() enables fast element location, while np.count_nonzero() provides efficient tallying of non-zero (or condition-matching) entries.

import numpy as np
np.__version__
author = 'kyubyong. longinglove@nate.com'

SortingΒΆ

Solutions using np.sort(), np.lexsort(), np.argsort(), np.partition(), and np.argpartition() for ordering array elements. Note the distinction between functions that return sorted copies versus in-place methods, and between full sorts (O(n log n)) and partial sorts via np.partition() (O(n)) which are faster when you only need the k-th smallest element.

Q1. Sort x along the second axis.

x = np.array([[1,4],[3,1]])
out = np.sort(x, axis=1)
x.sort(axis=1)
assert np.array_equal(out, x)
print out

Q2. Sort pairs of surnames and first names and return their indices. (first by surname, then by name).

surnames =    ('Hertz',    'Galilei', 'Hertz')
first_names = ('Heinrich', 'Galileo', 'Gustav')
print np.lexsort((first_names, surnames))

Q3. Get the indices that would sort x along the second axis.

x = np.array([[1,4],[3,1]])
out = np.argsort(x, axis=1)
print out

Q4. Create an array such that its fifth element would be the same as the element of sorted x, and it divide other elements by their value.

x = np.random.permutation(10)
print "x =", x
print "\nCheck the fifth element of this new array is 5, the first four elements are all smaller than 5, and 6th through the end are bigger than 5\n", 
out = np.partition(x, 5)
x.partition(5) # in-place equivalent
assert np.array_equal(x, out)
print out

Q5. Create the indices of an array such that its third element would be the same as the element of sorted x, and it divide other elements by their value.

x = np.random.permutation(10)
print "x =", x
partitioned = np.partition(x, 3)
indices = np.argpartition(x, 3)
print "partitioned =", partitioned
print "indices =", partitioned
assert np.array_equiv(x[indices], partitioned)

SearchingΒΆ

Solutions using np.argmax(), np.argmin(), np.nanargmax(), np.nanargmin(), np.nonzero(), np.flatnonzero(), np.where(), np.extract(), and np.searchsorted() for locating elements by value or condition. The nan-aware variants are critical for real-world data that contains missing values, while np.searchsorted() enables O(log n) insertion-point lookups in sorted arrays.

Q6. Get the maximum and minimum values and their indices of x along the second axis.

x = np.random.permutation(10).reshape(2, 5)
print "x =", x
print "maximum values =", np.max(x, 1)
print "max indices =", np.argmax(x, 1)
print "minimum values =", np.min(x, 1)
print "min indices =", np.argmin(x, 1)

Q7. Get the maximum and minimum values and their indices of x along the second axis, ignoring NaNs.

x = np.array([[np.nan, 4], [3, 2]])
print "maximum values ignoring NaNs =", np.nanmax(x, 1)
print "max indices =", np.nanargmax(x, 1)
print "minimum values ignoring NaNs =", np.nanmin(x, 1)
print "min indices =", np.nanargmin(x, 1)

Q8. Get the values and indices of the elements that are bigger than 2 in x.

x = np.array([[1, 2, 3], [1, 3, 5]])
print "Values bigger than 2 =", x[x>2]
print "Their indices are ", np.nonzero(x > 2)
assert np.array_equiv(x[x>2], x[np.nonzero(x > 2)])
assert np.array_equiv(x[x>2], np.extract(x > 2, x))

Q9. Get the indices of the elements that are bigger than 2 in the flattend x.

x = np.array([[1, 2, 3], [1, 3, 5]])
print np.flatnonzero(x>2)
assert np.array_equiv(np.flatnonzero(x), x.ravel().nonzero())

Q10. Check the elements of x and return 0 if it is less than 0, otherwise the element itself.

x = np.arange(-5, 4).reshape(3, 3)
print np.where(x <0, 0, x)

Q11. Get the indices where elements of y should be inserted to x to maintain order.

x = [1, 3, 5, 7, 9]
y = [0, 4, 2, 6]
np.searchsorted(x, y)

CountingΒΆ

Solutions using np.count_nonzero() to efficiently tally elements that satisfy a condition. Since boolean True values are treated as nonzero, combining a boolean expression with np.count_nonzero() counts how many elements match any arbitrary condition – a fast alternative to len(x[condition]) that avoids creating an intermediate array.

Q12. Get the number of nonzero elements in x.

x = [[0,1,7,0,0],[3,0,0,2,19]]
print np.count_nonzero(x)
assert np.count_nonzero(x) == len(x[x!=0])