Getting Started with petl: Installation, Basic Operations, and Practical Examples
This article introduces the Python petl library for easy ETL tasks, explains how to install it via pip, and demonstrates core operations such as loading CSV data, viewing, filtering, sorting, converting, aggregating, joining, deduplicating, and performing basic statistical analysis with clear code examples.
petl Overview – petl (Python ETL) is a library designed to simplify data extraction, transformation, and loading, offering spreadsheet‑like functions for handling CSV, Excel, databases, and other sources.
Installation
Install petl using pip:
pip install petlBasic Usage Examples
1. Load data from a CSV file :
from petl import fromcsv
table = fromcsv('example.csv')2. View the first few rows :
from petl import head
head(table, 5) # display first 5 rows3. Filter rows where a column meets a condition :
from petl import selectwhere
filtered_table = selectwhere(table, 'age', '>', 30) # rows with age > 304. Sort data :
from petl import sort
sorted_table = sort(table, 'age') # ascending
sorted_table_desc = sort(table, 'age', reverse=True) # descending5. Convert or map column values :
from petl import convert
converted_table = convert(table, 'age', lambda v: v + 1) # increment age by 16. Output data to a new CSV file :
from petl import tocsv
tocsv(converted_table, 'output.csv')Group Aggregation
Aggregate sales data by product and sum the amount column:
from petl import aggregate
grouped_data = aggregate(sales_data, keys=['product'], aggregates={'amount': 'sum'})Joining Tables
Join two tables (e.g., orders and products) on a common column:
from petl import join
joined_table = join(orders, products, key='product_id')Deduplication
Remove duplicate rows from a table:
from petl import distinct
unique_data = distinct(data_with_duplicates)Statistical Analysis
Compute basic statistics such as mean, max, and min on a numeric column:
from petl import stats
mean_value = stats.mean(numbers, 'value')
max_value = stats.max(numbers, 'value')
min_value = stats.min(numbers, 'value')Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.