Set Routines β SolutionsΒΆ
Solutions demonstrating NumPyβs set operations for treating arrays as mathematical sets. These functions β np.unique, np.in1d, np.intersect1d, np.setdiff1d, np.setxor1d, and np.union1d β automatically handle duplicate removal and sorting, making them essential for data deduplication, membership testing, and combining datasets in ML preprocessing pipelines.
import numpy as np
np.__version__
author = 'kyubyong. longinglove@nate.com'
Making Proper SetsΒΆ
Solutions using np.unique() to extract distinct elements and reconstruction indices from arrays. The return_inverse parameter is particularly useful for label encoding in machine learning β it maps each original value to its position in the sorted unique array, enabling lossless compression and reconstruction of categorical data.
Q1. Get unique elements and reconstruction indices from x. And reconstruct x.
x = np.array([1, 2, 6, 4, 2, 3, 2])
out, indices = np.unique(x, return_inverse=True)
print "unique elements =", out
print "reconstruction indices =", indices
print "reconstructed =", out[indices]
Boolean OperationsΒΆ
Solutions using np.in1d(), np.intersect1d(), np.setdiff1d(), np.setxor1d(), and np.union1d() for set-theoretic operations on arrays. These mirror mathematical set operations (membership, intersection, difference, symmetric difference, union) and are commonly used to compare training and test splits, find overlapping features between datasets, or identify missing categories.
Q2. Create a boolean array of the same shape as x. If each element of x is present in y, the result will be True, otherwise False.
x = np.array([0, 1, 2, 5, 0])
y = np.array([0, 1])
print np.in1d(x, y)
Q3. Find the unique intersection of x and y.
x = np.array([0, 1, 2, 5, 0])
y = np.array([0, 1, 4])
print np.intersect1d(x, y)
Q4. Find the unique elements of x that are not present in y.
x = np.array([0, 1, 2, 5, 0])
y = np.array([0, 1, 4])
print np.setdiff1d(x, y)
Q5. Find the xor elements of x and y.
x = np.array([0, 1, 2, 5, 0])
y = np.array([0, 1, 4])
out1 = np.setxor1d(x, y)
out2 = np.sort(np.concatenate((np.setdiff1d(x, y), np.setdiff1d(y, x))))
assert np.allclose(out1, out2)
print out1
Q6. Find the union of x and y.
x = np.array([0, 1, 2, 5, 0])
y = np.array([0, 1, 4])
out1 = np.union1d(x, y)
out2 = np.sort(np.unique(np.concatenate((x, y))))
assert np.allclose(out1, out2)
print np.union1d(x, y)