Run this notebook: Open in Colab Open in Kaggle

String Operations¶

NumPy’s np.char module provides vectorized string operations that work element-wise on arrays of strings. Functions like np.char.add(), np.char.multiply(), np.char.capitalize(), np.char.lower(), np.char.strip(), and np.char.replace() mirror Python’s built-in string methods but operate on entire arrays at once. While Pandas is more commonly used for string manipulation in data science, these NumPy functions are useful for lightweight text processing without the overhead of a DataFrame – for example, cleaning label arrays or formatting output strings.

from __future__ import print_function
import numpy as np

author = "kyubyong. https://github.com/Kyubyong/numpy_exercises"

np.__version__

Q1. Concatenate x1 and x2.

x1 = np.array(['Hello', 'Say'], dtype=np.str)
x2 = np.array([' world', ' something'], dtype=np.str)

Q2. Repeat x three time element-wise.

x = np.array(['Hello ', 'Say '], dtype=np.str)

Q3-1. Capitalize the first letter of x element-wise.
Q3-2. Lowercase x element-wise.
Q3-3. Uppercase x element-wise.
Q3-4. Swapcase x element-wise.
Q3-5. Title-case x element-wise.

x = np.array(['heLLo woRLd', 'Say sOmething'], dtype=np.str)
capitalized = ...
lowered = ...
uppered = ...
swapcased = ...
titlecased = ...
print("capitalized =", capitalized)
print("lowered =", lowered)
print("uppered =", uppered)
print("swapcased =", swapcased)
print("titlecased =", titlecased)

Q4. Make the length of each element 20 and the string centered / left-justified / right-justified with paddings of _.

x = np.array(['hello world', 'say something'], dtype=np.str)
centered = ...
left = ...
right = ...

print("centered =", centered)
print("left =", left)
print("right =", right)

Q5. Encode x in cp500 and decode it again.

x = np.array(['hello world', 'say something'], dtype=np.str)
encoded = ...
decoded = ...
print("encoded =", encoded)
print("decoded =", decoded)

Q6. Insert a space between characters of x.

x = np.array(['hello world', 'say something'], dtype=np.str)

Q7-1. Remove the leading and trailing whitespaces of x element-wise.
Q7-2. Remove the leading whitespaces of x element-wise.
Q7-3. Remove the trailing whitespaces of x element-wise.

x = np.array(['   hello world   ', '\tsay something\n'], dtype=np.str)
stripped = ...
lstripped = ...
rstripped = ...
print("stripped =", stripped)
print("lstripped =", lstripped)
print("rstripped =", rstripped)

Q8. Split the element of x with spaces.

x = np.array(['Hello my name is John'], dtype=np.str)

Q9. Split the element of x to multiple lines.

x = np.array(['Hello\nmy name is John'], dtype=np.str)

Q10. Make x a numeric string of 4 digits with zeros on its left.

x = np.array(['34'], dtype=np.str)

Q11. Replace “John” with “Jim” in x.

x = np.array(['Hello nmy name is John'], dtype=np.str)

Comparison¶

NumPy provides element-wise string comparison functions in np.char.equal() and np.char.not_equal(). These return boolean arrays indicating where strings match or differ, which is useful for validating data, finding mismatches between predicted and expected labels, or filtering arrays of categorical data.

Q12. Return x1 == x2, element-wise.

x1 = np.array(['Hello', 'my', 'name', 'is', 'John'], dtype=np.str)
x2 = np.array(['Hello', 'my', 'name', 'is', 'Jim'], dtype=np.str)

Q13. Return x1 != x2, element-wise.

x1 = np.array(['Hello', 'my', 'name', 'is', 'John'], dtype=np.str)
x2 = np.array(['Hello', 'my', 'name', 'is', 'Jim'], dtype=np.str)

String Information¶

These functions inspect string content element-wise: np.char.count() counts substring occurrences, np.char.find() locates substrings, and functions like np.char.isdigit(), np.char.islower(), and np.char.isupper() test character properties. These are useful for data validation – checking whether a column of strings contains only numeric data, or verifying formatting consistency across a dataset.

Q14. Count the number of “l” in x, element-wise.

x = np.array(['Hello', 'my', 'name', 'is', 'Lily'], dtype=np.str)

Q15. Count the lowest index of “l” in x, element-wise.

x = np.array(['Hello', 'my', 'name', 'is', 'Lily'], dtype=np.str)

Q16-1. Check if each element of x is composed of digits only.
Q16-2. Check if each element of x is composed of lower case letters only.
Q16-3. Check if each element of x is composed of upper case letters only.

x = np.array(['Hello', 'I', 'am', '20', 'years', 'old'], dtype=np.str)
out1 = ...
out2 = ...
out3 = ...
print("Digits only =", out1)
print("Lower cases only =", out2)
print("Upper cases only =", out3)

Q17. Check if each element of x starts with “hi”.

x = np.array(['he', 'his', 'him', 'his'], dtype=np.str)