Run this notebook: Open in Colab Open in Kaggle

Plot Feature Selection¶

============================ Univariate Feature Selection¶

This notebook is an example of using univariate feature selection to improve classification accuracy on a noisy dataset.

In this example, some noisy (non informative) features are added to the iris dataset. Support vector machine (SVM) is used to classify the dataset both before and after applying univariate feature selection. For each feature, we plot the p-values for the univariate feature selection and the corresponding weights of SVMs. With this, we will compare model accuracy and examine the impact of univariate feature selection on model weights.

Imports for Univariate Feature Selection with SVM Classification¶

SelectKBest with f_classif uses ANOVA F-statistics to filter features before model training: For classification tasks, f_classif computes a one-way ANOVA F-value for each feature, testing whether the feature’s mean differs significantly across target classes. Features with high F-values (low p-values) have strong linear discriminative power. By selecting the top k=4 features from the augmented Iris dataset (4 original + 20 random noise features), SelectKBest eliminates the uninformative dimensions before the SVM sees the data. The negative log10 of p-values provides a visually intuitive scoring where higher bars indicate more statistically significant features.

Comparing SVM weights with and without feature selection reveals how noise features degrade model performance: Without selection, LinearSVC must learn weights for all 24 features, and even with MinMaxScaler normalizing the input range, the 20 noise features dilute the classifier’s capacity and may cause overfitting. After selection, the SVM only receives the 4 most discriminative features, concentrating its weight budget on truly informative dimensions. The inverse_transform capability of SelectKBest allows mapping the reduced-space SVM weights back to the original feature space, showing exactly which features received non-zero importance. This pipeline approach – feature selection inside make_pipeline – is critical because it ensures that feature selection statistics are computed only on training data, preventing information leakage from the test set.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

# %%
# Generate sample data
# --------------------
#
import numpy as np

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# The iris dataset
X, y = load_iris(return_X_y=True)

# Some noisy data not correlated
E = np.random.RandomState(42).uniform(0, 0.1, size=(X.shape[0], 20))

# Add the noisy data to the informative features
X = np.hstack((X, E))

# Split dataset to select feature and evaluate the classifier
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0)

# %%
# Univariate feature selection
# ----------------------------
#
# Univariate feature selection with F-test for feature scoring.
# We use the default selection function to select
# the four most significant features.
from sklearn.feature_selection import SelectKBest, f_classif

selector = SelectKBest(f_classif, k=4)
selector.fit(X_train, y_train)
scores = -np.log10(selector.pvalues_)
scores /= scores.max()

# %%
import matplotlib.pyplot as plt

X_indices = np.arange(X.shape[-1])
plt.figure(1)
plt.clf()
plt.bar(X_indices - 0.05, scores, width=0.2)
plt.title("Feature univariate score")
plt.xlabel("Feature number")
plt.ylabel(r"Univariate score ($-Log(p_{value})$)")
plt.show()

# %%
# In the total set of features, only the 4 of the original features are significant.
# We can see that they have the highest score with univariate feature
# selection.

# %%
# Compare with SVMs
# -----------------
#
# Without univariate feature selection
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import LinearSVC

clf = make_pipeline(MinMaxScaler(), LinearSVC())
clf.fit(X_train, y_train)
print(
    "Classification accuracy without selecting features: {:.3f}".format(
        clf.score(X_test, y_test)
    )
)

svm_weights = np.abs(clf[-1].coef_).sum(axis=0)
svm_weights /= svm_weights.sum()

# %%
# After univariate feature selection
clf_selected = make_pipeline(SelectKBest(f_classif, k=4), MinMaxScaler(), LinearSVC())
clf_selected.fit(X_train, y_train)
print(
    "Classification accuracy after univariate feature selection: {:.3f}".format(
        clf_selected.score(X_test, y_test)
    )
)

svm_weights_selected = np.abs(clf_selected[-1].coef_).sum(axis=0)
svm_weights_selected /= svm_weights_selected.sum()

# %%
plt.bar(
    X_indices - 0.45, scores, width=0.2, label=r"Univariate score ($-Log(p_{value})$)"
)

plt.bar(X_indices - 0.25, svm_weights, width=0.2, label="SVM weight")

plt.bar(
    X_indices[selector.get_support()] - 0.05,
    svm_weights_selected,
    width=0.2,
    label="SVM weights after selection",
)

plt.title("Comparing feature selection")
plt.xlabel("Feature number")
plt.yticks(())
plt.axis("tight")
plt.legend(loc="upper right")
plt.show()

# %%
# Without univariate feature selection, the SVM assigns a large weight
# to the first 4 original significant features, but also selects many of the
# non-informative features. Applying univariate feature selection before
# the SVM increases the SVM weight attributed to the significant features,
# and will thus improve classification.