Artificial Intelligence 10 min read

A Survey of Python Libraries for Hyperparameter Optimization, Feature Selection, Model Explainability, and Rapid Machine Learning Development

This article introduces several Python libraries—including Optuna, ITMO_FS, shap‑hypertune, PyCaret, floWeaver, Gradio, Terality, and torch‑handle—that simplify hyperparameter tuning, feature selection, model explainability, visualization, and low‑code ML workflows, providing code examples and key advantages for each tool.

Python Programming Learning Circle

Feb 23, 2022

A Survey of Python Libraries for Hyperparameter Optimization, Feature Selection, Model Explainability, and Rapid Machine Learning Development

Optuna is an open‑source hyper‑parameter optimization framework that automatically finds the best hyper‑parameters for machine‑learning models using a Bayesian optimization algorithm called Tree‑structured Parzen Estimator, offering a more efficient alternative to sklearn’s GridSearchCV and works with any ML library such as TensorFlow, Keras or PyTorch.

ITMO_FS is a feature‑selection library for ML models that provides six categories of algorithms (supervised filters, unsupervised filters, wrappers, hybrid, embedded, and ensemble) and helps avoid over‑fitting by reducing the number of features; a typical usage example is shown below.

>> from sklearn.linear_model import SGDClassifier
>>> from ITMO_FS.embedded import MOS

>>> X, y = make_classification(n_samples=300, n_features=10, random_state=0, n_informative=2)
>>> sel = MOS()
>>> trX = sel.fit_transform(X, y, smote=False)

>>> cl1 = SGDClassifier()
>>> cl1.fit(X, y)
>>> cl1.score(X, y)
0.9033333333333333

>>> cl2 = SGDClassifier()
>>> cl2.fit(trX, y)
>>> cl2.score(trX, y)
0.9433333333333334

shap‑hypertune combines SHAP (SHapley Additive exPlanations) model‑explainability with hyper‑parameter tuning, allowing simultaneous selection of informative features and optimal hyper‑parameters via grid, random or Bayesian search, though it currently supports only gradient‑boosting models.

PyCaret is a low‑code, open‑source Python library that automates the entire ML workflow—including data loading, preprocessing, model comparison, creation of interactive apps, API generation and Docker packaging—with just a few lines of code, as illustrated in the examples.

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')

# compare models
best = compare_models()

floWeaver generates Sankey diagrams from flow‑type datasets, useful for visualising conversion funnels, marketing journeys or budget allocations; the input format is “source × target × value” and a single line of code creates the diagram.

Gradio provides an easy way to build front‑end interfaces for ML models by specifying input types, functions and outputs, enabling rapid prototyping and free hosting on Hugging Face, making model interaction far simpler than building a Flask app.

Terality offers a Pandas‑compatible API that compiles operations to Spark, delivering 10‑100× speed‑ups, parallel execution, and off‑loading of computation to the cloud, though the free tier limits usage to 1 TB per month.

torch‑handle abstracts repetitive PyTorch training code, allowing users to define a model, dataset, optimizer and scheduler in a few lines and run training sessions automatically, with built‑in reporting and TensorBoard integration.

from collections import OrderedDict
import torch
from torchhandle.workflow import BaseContext

class Net(torch.nn.Module):
    def __init__(self, ):
        super().__init__()
        self.layer = torch.nn.Sequential(OrderedDict([
            ('l1', torch.nn.Linear(10, 20)),
            ('a1', torch.nn.ReLU()),
            ('l2', torch.nn.Linear(20, 10)),
            ('a2', torch.nn.ReLU()),
            ('l3', torch.nn.Linear(10, 1))
        ]))

    def forward(self, x):
        x = self.layer(x)
        return x

num_samples, num_features = int(1e4), int(1e1)
X, Y = torch.rand(num_samples, num_features), torch.rand(num_samples)
dataset = torch.utils.data.TensorDataset(X, Y)
trn_loader = torch.utils.data.DataLoader(dataset, batch_size=64, num_workers=0, shuffle=True)
loaders = {"train": trn_loader, "valid": trn_loader}
device = 'cuda' if torch.cuda.is_available() else 'cpu'

model = {"fn": Net}
criterion = {"fn": torch.nn.MSELoss}
optimizer = {"fn": torch.optim.Adam,
            "args": {"lr": 0.1},
            "params": {"layer.l1.weight": {"lr": 0.01},
                       "layer.l1.bias": {"lr": 0.02}}
            }
scheduler = {"fn": torch.optim.lr_scheduler.StepLR,
            "args": {"step_size": 2, "gamma": 0.9}}

c = BaseContext(model=model,
              criterion=criterion,
              optimizer=optimizer,
              scheduler=scheduler,
              context_tag="ex01")
train = c.make_train_session(device, dataloader=loaders)
train.train(epochs=10)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Low‑code feature selection hyperparameter optimization Model Explainability

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.