Run this notebook: Open in Colab Open in Kaggle

Plot Randomized Search¶

========================================================================= Comparing randomized search and grid search for hyperparameter estimation¶

Compare randomized search and grid search for optimizing hyperparameters of a linear SVM with SGD training. All parameters that influence the learning are searched simultaneously (except for the number of estimators, which poses a time / quality tradeoff).

The randomized search and the grid search explore exactly the same space of parameters. The result in parameter settings is quite similar, while the run time for randomized search is drastically lower.

The performance is may slightly worse for the randomized search, and is likely due to a noise effect and would not carry over to a held-out test set.

Note that in practice, one would not search over this many different parameters simultaneously using grid search, but pick only the ones deemed most important.

Imports for Randomized Search vs Grid Search Comparison¶

Grid search exhaustiveness vs randomized search efficiency: GridSearchCV evaluates every point on a predefined parameter grid, which grows combinatorially – with 10 values per parameter and 3 parameters, that is 1000 combinations. RandomizedSearchCV instead samples n_iter random configurations from parameter distributions, achieving comparable results in a fraction of the time. For continuous parameters like alpha and l1_ratio, randomized search can sample from continuous distributions (stats.loguniform, stats.uniform) rather than being restricted to a discrete grid, often finding better configurations in unexplored regions between grid points.

SGD classifier with elastic net regularization: The SGDClassifier with loss="hinge" implements a linear SVM trained via stochastic gradient descent, scaling to datasets too large for the standard SVM solver. The penalty="elasticnet" combines L1 and L2 regularization controlled by l1_ratio (0 = pure L2, 1 = pure L1), and alpha controls the overall regularization strength. The report helper function extracts the top-ranked configurations from cv_results_, displaying mean validation scores with standard deviations. Both search strategies explore the same parameter space but randomized search with n_iter=15 evaluates far fewer candidates than the full grid of 60 combinations, demonstrating the practical speedup with minimal quality loss.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

from time import time

import numpy as np
import scipy.stats as stats

from sklearn.datasets import load_digits
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

# get some data
X, y = load_digits(return_X_y=True, n_class=3)

# build a classifier
clf = SGDClassifier(loss="hinge", penalty="elasticnet", fit_intercept=True)


# Utility function to report best scores

def report(results, n_top=3):
    for i in range(1, n_top + 1):
        candidates = np.flatnonzero(results["rank_test_score"] == i)
        for candidate in candidates:
            print("Model with rank: {0}".format(i))
            print(
                "Mean validation score: {0:.3f} (std: {1:.3f})".format(
                    results["mean_test_score"][candidate],
                    results["std_test_score"][candidate],
                )
            )
            print("Parameters: {0}".format(results["params"][candidate]))
            print("")


# specify parameters and distributions to sample from
param_dist = {
    "average": [True, False],
    "l1_ratio": stats.uniform(0, 1),
    "alpha": stats.loguniform(1e-2, 1e0),
}

# run randomized search
n_iter_search = 15
random_search = RandomizedSearchCV(
    clf, param_distributions=param_dist, n_iter=n_iter_search
)

start = time()
random_search.fit(X, y)
print(
    "RandomizedSearchCV took %.2f seconds for %d candidates parameter settings."
    % ((time() - start), n_iter_search)
)
report(random_search.cv_results_)

# use a full grid over all parameters
param_grid = {
    "average": [True, False],
    "l1_ratio": np.linspace(0, 1, num=10),
    "alpha": np.power(10, np.arange(-2, 1, dtype=float)),
}

# run grid search
grid_search = GridSearchCV(clf, param_grid=param_grid)
start = time()
grid_search.fit(X, y)

print(
    "GridSearchCV took %.2f seconds for %d candidate parameter settings."
    % (time() - start, len(grid_search.cv_results_["params"]))
)
report(grid_search.cv_results_)