Run this notebook: Open in Colab Open in Kaggle

Support Vector Machines (SVM) with scikit-learn¶

Support Vector Machines are powerful supervised learning algorithms that find the optimal boundary (hyperplane) separating different classes in feature space. What makes SVMs special is the maximum margin principle – among all possible boundaries, SVMs choose the one that maximizes the distance to the nearest data points from each class. This geometric intuition makes SVMs robust to small perturbations in the data and effective even in high-dimensional spaces where other classifiers struggle.

Credits: Forked from PyCon 2015 Scikit-learn Tutorial by Jake VanderPlas

Support Vector Machine Classifier
Support Vector Machine with Kernels Classifier

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn; 
from sklearn.linear_model import LinearRegression
from scipy import stats
import pylab as pl

seaborn.set()

Linear SVM Classifier¶

The core idea behind SVMs is that many different decision boundaries can separate two classes, but the optimal boundary is the one with the widest margin. The margin is defined as the perpendicular distance from the boundary to the nearest data points on either side. The visualization below shows three possible separating lines with their margins (gray bands) – the SVM algorithm will select the one with the widest band.

Support Vector Machines (SVMs) are a powerful supervised learning algorithm used for classification or for regression. SVMs draw a boundary between clusters of data. SVMs attempt to maximize the margin between sets of points. Many lines can be drawn to separate the points above:

from sklearn.datasets.samples_generator import make_blobs
X, y = make_blobs(n_samples=50, centers=2,
                  random_state=0, cluster_std=0.60)

xfit = np.linspace(-1, 3.5)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='spring')

# Draw three lines that couple separate the data
for m, b, d in [(1, 0.65, 0.33), (0.5, 1.6, 0.55), (-0.2, 2.9, 0.2)]:
    yfit = m * xfit + b
    plt.plot(xfit, yfit, '-k')
    plt.fill_between(xfit, yfit - d, yfit + d, edgecolor='none', color='#AAAAAA', alpha=0.4)

plt.xlim(-1, 3.5);

Using SVC(kernel='linear') creates a linear SVM. The fit() method finds the maximum-margin hyperplane by solving a constrained optimization problem. Under the hood, only the data points closest to the boundary (the support vectors) actually determine the position of the boundary – all other points could be removed without changing the result.

from sklearn.svm import SVC
clf = SVC(kernel='linear')
clf.fit(X, y)

The helper function below plots the decision boundary (solid line) and the margins (dashed lines) by evaluating the decision_function over a 2-D grid. The decision function returns the signed distance from each point to the boundary – positive on one side, negative on the other, and zero on the boundary itself.

def plot_svc_decision_function(clf, ax=None):
    """Plot the decision function for a 2D SVC"""
    if ax is None:
        ax = plt.gca()
    x = np.linspace(plt.xlim()[0], plt.xlim()[1], 30)
    y = np.linspace(plt.ylim()[0], plt.ylim()[1], 30)
    Y, X = np.meshgrid(y, x)
    P = np.zeros_like(X)
    for i, xi in enumerate(x):
        for j, yj in enumerate(y):
            P[i, j] = clf.decision_function([xi, yj])
    # plot the margins
    ax.contour(X, Y, P, colors='k',
               levels=[-1, 0, 1], alpha=0.5,
               linestyles=['--', '-', '--'])

In the following plot the dashed lines touch a couple of the points known as support vectors, which are stored in the support_vectors_ attribute of the classifier:

plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='spring')
plot_svc_decision_function(clf)
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],
            s=200, facecolors='none');

Use IPython’s interact functionality to explore how the distribution of points affects the support vectors and the discriminative fit:

from IPython.html.widgets import interact

def plot_svm(N=100):
    X, y = make_blobs(n_samples=200, centers=2,
                      random_state=0, cluster_std=0.60)
    X = X[:N]
    y = y[:N]
    clf = SVC(kernel='linear')
    clf.fit(X, y)
    plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='spring')
    plt.xlim(-1, 4)
    plt.ylim(-1, 6)
    plot_svc_decision_function(clf, plt.gca())
    plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],
                s=200, facecolors='none')
    
interact(plot_svm, N=[10, 200], kernel='linear');

Support Vector Machine with Kernels Classifier¶

Kernels are useful when the decision boundary is not linear. A Kernel is some functional transformation of the input data. SVMs have clever tricks to ensure kernel calculations are efficient. In the example below, a linear boundary is not useful in separating the groups of points:

from sklearn.datasets.samples_generator import make_circles
X, y = make_circles(100, factor=.1, noise=.1)

clf = SVC(kernel='linear').fit(X, y)

plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='spring')
plot_svc_decision_function(clf);

A radial basis function (RBF) kernel computes a similarity measure based on the Euclidean distance between points. Applying it effectively adds a new dimension to the data. Below, we compute r = exp(-(x^2 + y^2)) to lift the 2-D circular data into 3-D, where a linear plane can separate the two classes. The interact widget lets you rotate the 3-D view to see how the classes become linearly separable in the higher-dimensional space.

r = np.exp(-(X[:, 0] ** 2 + X[:, 1] ** 2))

from mpl_toolkits import mplot3d

def plot_3D(elev=30, azim=30):
    ax = plt.subplot(projection='3d')
    ax.scatter3D(X[:, 0], X[:, 1], r, c=y, s=50, cmap='spring')
    ax.view_init(elev=elev, azim=azim)
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_zlabel('r')

interact(plot_3D, elev=[-90, 90], azip=(-180, 180));

Using SVC(kernel='rbf') lets the SVM automatically perform this kernel trick without you having to manually compute the transformation. The result is a non-linear decision boundary in the original feature space that correctly separates the concentric circles. The RBF kernel is the default in scikit-learn’s SVC and works well for a wide range of problems.

clf = SVC(kernel='rbf')
clf.fit(X, y)

plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='spring')
plot_svc_decision_function(clf)
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],
            s=200, facecolors='none');

SVM additional notes:

When using an SVM you need to choose the right values for parameters such as c and gamma. Model validation can help to determine these optimal values by trial and error.
SVMs run in O(n^3) performance. LinearSVC is scalable, SVC does not seem to be scalable. For large data sets try transforming the data to a smaller space and use LinearSVC with rbf.