Artificial Intelligence 15 min read

Top 7 Python Libraries and Packages of the Year for Data Science and AI

This article reviews the seven most notable Python libraries and packages of 2018 for data scientists and AI practitioners, including AdaNet, TPOT, SHAP, Optimus, spaCy, Jupytext, and Chartify, with descriptions, installation commands, and usage examples.

Python Programming Learning Circle

Mar 10, 2022

Top 7 Python Libraries and Packages of the Year for Data Science and AI

The author, Favio Vázquez, compiled a list of the seven best Python libraries for data science and artificial intelligence in 2018, summarizing their purpose, installation steps, and example usage.

AdaNet – Fast and Flexible AutoML Framework

AdaNet is a lightweight, scalable TensorFlow AutoML framework that trains and deploys adaptive neural networks by combining multiple sub‑networks to reduce the complexity of designing effective models. It requires TensorFlow ≥1.7. $ pip install "tensorflow>=1.7.0" To install from source:

$ git clone https://github.com/tensorflow/adanet && cd adanet

Run tests and install as a pip package:

$ cd adanet
$ bazel test -c opt //...

import adanet

TPOT – Automated Machine Learning Tool

TPOT (Tree‑based Pipeline Optimization Tool) uses genetic programming to automatically discover optimal ML pipelines. It builds on scikit‑learn and can be installed via pip. pip install tpot Example with the Iris dataset:

from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, train_size=0.75, test_size=0.25)

tpot = TPOTClassifier(verbosity=2, max_time_mins=2)
tpot.fit(X_train, y_train)
tpot.export('tpot_iris_pipeline.py')

SHAP – Unified Explanation Method for Machine Learning Models

SHAP (SHapley Additive exPlanations) provides a unified framework to explain any model’s output using concepts from game theory.

pip install shap

conda install -c conda-forge shap

Example using DeepExplainer on a Keras model (MNIST):

# code from https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py
import keras
from keras.datasets import mnist
# ... (model definition and training) ...

Optimus – Agile Data‑Science Workflow

Optimus extends Spark DataFrames with a pandas‑like API, enabling distributed data cleaning, preparation, analysis, and machine‑learning pipelines. pip install optimuspyspark Basic usage example:

from optimus import Optimus
op = Optimus()
df = op.load.url("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/foo.csv")
df.rows.sort("product", "desc")
df.cols.lower(["firstName", "lastName"]).date_transform("birth", "new_date", "yyyy/MM/dd", "dd-MM-YYYY")

spaCy – Industrial‑Strength Natural Language Processing

spaCy offers a fast, production‑ready NLP library with tokenization, tagging, parsing, NER, and word vectors, integrating smoothly with TensorFlow, PyTorch, scikit‑learn, and Gensim.

pip3 install spacy
python3 -m spacy download en

# Example usage
import spacy
nlp = spacy.load('en_core_web_sm')
text = "When Sebastian Thrun started working on self‑driving cars at Google in 2007..."
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.label_)

Jupytext – Sync Scripts and Notebooks

Jupytext lets you edit notebooks as plain scripts in your favorite IDE and sync them with Jupyter notebooks. pip install jupytext --upgrade Configure Jupyter:

c.NotebookApp.contents_manager_class = "jupytext.TextFileContentsManager"

Chartify – Simple Plotting Library Built on Bokeh

Chartify simplifies the creation of interactive, aesthetically pleasing charts with a consistent API on top of Bokeh. pip3 install chartify Example to create a stacked area chart:

import pandas as pd
import chartify

data = chartify.examples.example_data()
# Transform data (group by month and fruit)
# ... (data manipulation code) ...
ch = chartify.Chart(blank_labels=True, x_axis_type='datetime')
ch.set_title("Stacked area")
ch.plot.area(data_frame=total_quantity_by_month_and_fruit,
             x_column='month', y_column='quantity',
             color_column='fruit', stacked=True)
ch.show('png')

These libraries collectively cover AutoML, model explanation, data cleaning, NLP, notebook synchronization, and visualization, providing a comprehensive toolbox for modern data‑science and AI workflows.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

libraries Data cleaning data-science NLP visualization AutoML machine-learning

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.