Run this notebook: Open in Colab Open in Kaggle

Random Sampling¶

NumPy’s np.random module generates pseudo-random numbers from various probability distributions. Random sampling is fundamental to machine learning: initializing neural network weights, splitting data into train/test sets, Monte Carlo simulations, data augmentation, and stochastic gradient descent all depend on random number generation. Understanding how to draw from uniform, normal, and discrete distributions – and how to control reproducibility with seeds – is essential for any data scientist.

import numpy as np

np.__version__

__author__ = 'kyubyong. longinglove@nate.com'

Simple Random Data¶

np.random.rand() draws from a uniform distribution over [0, 1). np.random.randn() draws from a standard normal distribution (mean=0, std=1). np.random.randint() generates random integers. np.random.choice() samples from a given array with optional probability weights. These are the workhorses for generating synthetic data, initializing parameters, and implementing randomized algorithms.

Q1. Create an array of shape (3, 2) and populate it with random samples from a uniform distribution over [0, 1).

Q2. Create an array of shape (1000, 1000) and populate it with random samples from a standard normal distribution. And verify that the mean and standard deviation is close enough to 0 and 1 repectively.

Q3. Create an array of shape (3, 2) and populate it with random integers ranging from 0 to 3 (inclusive) from a discrete uniform distribution.

Q4. Extract 1 elements from x randomly such that each of them would be associated with probabilities .3, .5, .2. Then print the result 10 times.

x = [b'3 out of 10', b'5 out of 10', b'2 out of 10']

Q5. Extract 3 different integers from 0 to 9 randomly with the same probabilities.

Permutations¶

np.random.shuffle() randomly reorders an array in place, while np.random.permutation() returns a new shuffled array (or a permuted range of integers). Shuffling is critical for ML training – feeding data in random order prevents the model from learning spurious patterns related to data ordering, and is a prerequisite for creating unbiased mini-batches in stochastic gradient descent.

Q6. Shuffle numbers between 0 and 9 (inclusive).

# Or

Random Generator¶

np.random.seed() sets the state of the pseudo-random number generator, ensuring that subsequent random calls produce the same sequence. Reproducibility is critical in scientific computing and ML research – setting a seed allows others to replicate your exact results. Without a fixed seed, random weight initialization, data splits, and augmentation will differ between runs, making it impossible to isolate the effect of code changes from random variation.

Q7. Assign number 10 to the seed of the random generator so that you can get the same value next time.