Top 7 Python Libraries and Packages of the Year for Data Science and AI
This article reviews the seven most notable Python libraries and packages of 2018 for data scientists and AI practitioners, including AdaNet, TPOT, SHAP, Optimus, spaCy, Jupytext, and Chartify, with descriptions, installation commands, and usage examples.
The author, Favio Vázquez, compiled a list of the seven best Python libraries for data science and artificial intelligence in 2018, summarizing their purpose, installation steps, and example usage.
AdaNet – Fast and Flexible AutoML Framework
AdaNet is a lightweight, scalable TensorFlow AutoML framework that trains and deploys adaptive neural networks by combining multiple sub‑networks to reduce the complexity of designing effective models. It requires TensorFlow ≥1.7.
<code>$ pip install "tensorflow>=1.7.0"</code>To install from source:
<code>$ git clone https://github.com/tensorflow/adanet && cd adanet</code>Run tests and install as a pip package:
<code>$ cd adanet
$ bazel test -c opt //...</code> <code>import adanet</code>TPOT – Automated Machine Learning Tool
TPOT (Tree‑based Pipeline Optimization Tool) uses genetic programming to automatically discover optimal ML pipelines. It builds on scikit‑learn and can be installed via pip.
<code>pip install tpot</code>Example with the Iris dataset:
<code>from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, train_size=0.75, test_size=0.25)
tpot = TPOTClassifier(verbosity=2, max_time_mins=2)
tpot.fit(X_train, y_train)
tpot.export('tpot_iris_pipeline.py')</code>SHAP – Unified Explanation Method for Machine Learning Models
SHAP (SHapley Additive exPlanations) provides a unified framework to explain any model’s output using concepts from game theory.
<code>pip install shap</code> <code>conda install -c conda-forge shap</code>Example using DeepExplainer on a Keras model (MNIST):
<code># code from https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py
import keras
from keras.datasets import mnist
# ... (model definition and training) ...
</code>Optimus – Agile Data‑Science Workflow
Optimus extends Spark DataFrames with a pandas‑like API, enabling distributed data cleaning, preparation, analysis, and machine‑learning pipelines.
<code>pip install optimuspyspark</code>Basic usage example:
<code>from optimus import Optimus
op = Optimus()
df = op.load.url("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/foo.csv")
df.rows.sort("product", "desc")
df.cols.lower(["firstName", "lastName"]).date_transform("birth", "new_date", "yyyy/MM/dd", "dd-MM-YYYY")
</code>spaCy – Industrial‑Strength Natural Language Processing
spaCy offers a fast, production‑ready NLP library with tokenization, tagging, parsing, NER, and word vectors, integrating smoothly with TensorFlow, PyTorch, scikit‑learn, and Gensim.
<code>pip3 install spacy
python3 -m spacy download en</code> <code># Example usage
import spacy
nlp = spacy.load('en_core_web_sm')
text = "When Sebastian Thrun started working on self‑driving cars at Google in 2007..."
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.label_)
</code>Jupytext – Sync Scripts and Notebooks
Jupytext lets you edit notebooks as plain scripts in your favorite IDE and sync them with Jupyter notebooks.
<code>pip install jupytext --upgrade</code>Configure Jupyter:
<code>c.NotebookApp.contents_manager_class = "jupytext.TextFileContentsManager"
</code>Chartify – Simple Plotting Library Built on Bokeh
Chartify simplifies the creation of interactive, aesthetically pleasing charts with a consistent API on top of Bokeh.
<code>pip3 install chartify</code>Example to create a stacked area chart:
<code>import pandas as pd
import chartify
data = chartify.examples.example_data()
# Transform data (group by month and fruit)
# ... (data manipulation code) ...
ch = chartify.Chart(blank_labels=True, x_axis_type='datetime')
ch.set_title("Stacked area")
ch.plot.area(data_frame=total_quantity_by_month_and_fruit,
x_column='month', y_column='quantity',
color_column='fruit', stacked=True)
ch.show('png')
</code>These libraries collectively cover AutoML, model explanation, data cleaning, NLP, notebook synchronization, and visualization, providing a comprehensive toolbox for modern data‑science and AI workflows.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.