Fundamentals 4 min read

Getting Started with petl: Installation, Basic Operations, and Practical Examples

This article introduces the Python petl library for easy ETL tasks, explains how to install it via pip, and demonstrates core operations such as loading CSV data, viewing, filtering, sorting, converting, aggregating, joining, deduplicating, and performing basic statistical analysis with clear code examples.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
Getting Started with petl: Installation, Basic Operations, and Practical Examples

petl Overview – petl (Python ETL) is a library designed to simplify data extraction, transformation, and loading, offering spreadsheet‑like functions for handling CSV, Excel, databases, and other sources.

Installation

Install petl using pip:

pip install petl

Basic Usage Examples

1. Load data from a CSV file :

from petl import fromcsv
table = fromcsv('example.csv')

2. View the first few rows :

from petl import head
head(table, 5)  # display first 5 rows

3. Filter rows where a column meets a condition :

from petl import selectwhere
filtered_table = selectwhere(table, 'age', '>', 30)  # rows with age > 30

4. Sort data :

from petl import sort
sorted_table = sort(table, 'age')  # ascending
sorted_table_desc = sort(table, 'age', reverse=True)  # descending

5. Convert or map column values :

from petl import convert
converted_table = convert(table, 'age', lambda v: v + 1)  # increment age by 1

6. Output data to a new CSV file :

from petl import tocsv
tocsv(converted_table, 'output.csv')

Group Aggregation

Aggregate sales data by product and sum the amount column:

from petl import aggregate
grouped_data = aggregate(sales_data, keys=['product'], aggregates={'amount': 'sum'})

Joining Tables

Join two tables (e.g., orders and products) on a common column:

from petl import join
joined_table = join(orders, products, key='product_id')

Deduplication

Remove duplicate rows from a table:

from petl import distinct
unique_data = distinct(data_with_duplicates)

Statistical Analysis

Compute basic statistics such as mean, max, and min on a numeric column:

from petl import stats
mean_value = stats.mean(numbers, 'value')
max_value = stats.max(numbers, 'value')
min_value = stats.min(numbers, 'value')
PythonData ProcessingETLpetl
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.