Artificial Intelligence 6 min read

Master Feature Selection with Recursive Elimination (RFE) in Python

Feature Recursive Elimination (RFE) is a powerful feature‑selection technique that iteratively trains a model, discards the weakest features, and repeats until a desired number of features remains, helping prevent overfitting and improve model performance, illustrated with a complete Python example using scikit‑learn.

Model Perspective

Mar 20, 2023

Master Feature Selection with Recursive Elimination (RFE) in Python

Feature Recursive Elimination

Feature Recursive Elimination (RFE) is a feature‑selection algorithm that repeatedly trains a model and removes the weakest features until the desired number of features is reached.

The algorithm proceeds as follows:

Input all features into the model and obtain a performance metric (e.g., accuracy, F1 score).

Select the feature with the lowest metric ranking and remove it from the feature set.

Retrain the model and recompute the performance metric.

Repeat steps 2 and 3 until the feature count meets the preset value or no more features can be removed.

RFE helps avoid overfitting and improves model generalization; it also selects the most important features, enhancing efficiency and accuracy, but its repeated training makes it computationally expensive, especially with many features.

For example, in a binary‑classification problem with 100 features, RFE can be used to select the optimal 20 features.

Feed all 100 features into a model (e.g., logistic regression, SVM) and obtain a performance metric such as 0.85 accuracy.

Based on feature weights or importance, remove the lowest‑ranked feature, reducing the set to 99 features.

Retrain the model and compute the new metric, e.g., 0.86 accuracy.

Repeat steps 2 and 3, discarding one feature each time until only 20 features remain.

During this process, record the performance metric after each selection to identify the best feature subset, ultimately yielding a set of 20 optimal features and the corresponding best model performance.

Python Implementation

Assume we have a dataset with 100 features and a binary target variable, and we use a logistic regression model together with RFE to select the best 20 features.

from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE
from sklearn.datasets import make_classification

# Generate sample data with 100 features and a binary target
X, y = make_classification(n_samples=1000, n_features=100, n_informative=20, n_redundant=0, random_state=1)

# Create logistic regression model
model = LogisticRegression()

# Create RFE object to select the best 20 features
rfe = RFE(model, n_features_to_select=20)

# Train model and select the best 20 features
X_selected = rfe.fit_transform(X, y)

# Print indices of the selected features
print(rfe.get_support(indices=True))

# Print the dataset after feature selection
print(X_selected)

In the example above, make_classification() generates a synthetic dataset, fit_transform() trains the model and selects the top 20 features, and get_support(indices=True) returns the indices of those features.

[ 7  9 19 23 30 33 42 43 44 49 62 66 68 70 74 75 79 84 92 93]
[[ 2.10214605  0.95832137 -0.13046364 ... -4.84124111 -2.05522712
  -0.73465979]
 [-2.32648214 -0.53958974  1.85796597 ...  1.5400122   0.83695367
  -5.14693185]
 [ 1.02728537  0.23901911 -0.41383436 ... -0.28077503 -0.02212711
  -0.70009921]
 ...
 [ 3.37189209  0.52963901 -0.36913823 ... -4.05453548  2.5709366
   4.07060606]
 [-1.38319684  1.65007044  2.42354167 ... -0.25148219 -1.23954323
   2.37080765]
 [ 0.13845329 -0.28192572 -3.96853172 ... -4.67964015  2.46770024
   1.39891579]]

That concludes the introduction to Feature Recursive Elimination.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python feature selection scikit-learn recursive elimination

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.