Set RoutinesΒΆ

NumPy’s set routines perform operations analogous to mathematical set theory: finding unique elements, testing membership, and computing intersections, differences, unions, and symmetric differences. These operations are essential for data deduplication, comparing datasets, finding common features between groups, and validating that categorical values fall within expected ranges. Unlike Python’s built-in set type, NumPy’s set functions work on arrays and return sorted results.

import numpy as np
np.__version__
author = 'kyubyong. longinglove@nate.com'

Making Proper SetsΒΆ

np.unique() returns sorted unique elements from an array. With return_index=True, it also returns the indices of the first occurrence of each unique value. With return_inverse=True, it provides indices to reconstruct the original array from the unique values – useful for encoding categorical variables as integer labels and then reversing the encoding.

Q1. Get unique elements and reconstruction indices from x. And reconstruct x.

x = np.array([1, 2, 6, 4, 2, 3, 2])

Boolean OperationsΒΆ

np.in1d() tests whether each element of one array is present in another, returning a boolean array. np.intersect1d() finds common elements, np.setdiff1d() finds elements in one set but not another, np.setxor1d() finds elements in either set but not both, and np.union1d() combines all unique elements. These mirror the mathematical operations of intersection, difference, symmetric difference, and union – all widely used in data analysis for comparing cohorts, finding overlapping features, and reconciling datasets.

Q2. Create a boolean array of the same shape as x. If each element of x is present in y, the result will be True, otherwise False.

x = np.array([0, 1, 2, 5, 0])
y = np.array([0, 1])

Q3. Find the unique intersection of x and y.

x = np.array([0, 1, 2, 5, 0])
y = np.array([0, 1, 4])

Q4. Find the unique elements of x that are not present in y.

x = np.array([0, 1, 2, 5, 0])
y = np.array([0, 1, 4])

Q5. Find the xor elements of x and y.

x = np.array([0, 1, 2, 5, 0])
y = np.array([0, 1, 4])

Q6. Find the union of x and y.

x = np.array([0, 1, 2, 5, 0])
y = np.array([0, 1, 4])