Fundamentals 9 min read

Discretizing Numerical Variables with Pandas: between, cut, qcut, and value_counts

This article demonstrates four Pandas techniques—between with loc, cut, qcut, and value_counts—to discretize numeric variables into bins, assigning grades A, B, C to exam scores, and shows how to generate synthetic data, define bin boundaries, and count records per bin.

Python Programming Learning Circle

Jul 4, 2022

Discretizing Numerical Variables with Pandas: between, cut, qcut, and value_counts

Discretization, also known as binning, is a common data preprocessing technique that groups continuous values into intervals or "bins". This tutorial explains four methods using the Python Pandas library to bin numeric variables.

Creating Synthetic Data

import pandas as pd  # version 1.3.5
import numpy as np

def create_df():
    df = pd.DataFrame({'score': np.random.randint(0, 101, 1000)})
    return df

create_df()
df.head()

The dataset contains 1,000 students' exam scores ranging from 0 to 100. The goal is to categorize these scores into grades "A", "B", and "C", where "A" is the best and "C" the worst.

1. between & loc

The between method returns a boolean Series indicating whether each element lies between the specified left and right boundaries. Combined with loc, it can assign grades based on custom intervals.

left: left boundary

right: right boundary

inclusive: which boundaries to include ("both", "neither", "left", "right")

Grade intervals:

A: (80, 100]

B: (50, 80]

C: [0, 50]

df.loc[df['score'].between(0, 50, 'both'), 'grade'] = 'C'
df.loc[df['score'].between(50, 80, 'right'), 'grade'] = 'B'
df.loc[df['score'].between(80, 100, 'right'), 'grade'] = 'A'

Counting the number of records per grade:

df.grade.value_counts()

C    488
B    310
A    202
Name: grade, dtype: int64

This approach requires explicit handling for each bin, making it suitable only when the number of bins is small.

2. cut

The cut function bins values into discrete intervals, useful for converting continuous variables into categorical ones.

x: the array to bin (must be 1‑D)

bins: sequence defining bin edges (allows non‑uniform widths)

labels: labels for the resulting bins

include_lowest: whether the first interval should be left‑inclusive

bins = [0, 50, 80, 100]
labels = ['C', 'B', 'A']
df['grade'] = pd.cut(x=df['score'], bins=bins, labels=labels, include_lowest=True)

The resulting grade distribution matches the previous method:

df.grade.value_counts()

C    488
B    310
A    202
Name: grade, dtype: int64

3. qcut

The qcut function creates bins based on quantiles, ensuring (approximately) equal numbers of observations per bin.

x: input array (1‑D)

q: number of quantiles (e.g., 3 for terciles)

labels: labels for the bins

retbins: whether to return the bin edges

df['grade'], cut_bin = pd.qcut(df['score'], q=3, labels=['C','B','A'], retbins=True)

Resulting bin edges:

print(cut_bin)
>> [  0.   36.   68.  100.]

Grade distribution (≈333 records per grade):

df.grade.value_counts()

C    340
A    331
B    329
Name: grade, dtype: int64

4. value_counts with bins

The value_counts method can also perform binning when the bins argument is supplied.

df['score'].value_counts(bins=3, sort=False)

By default, the result is sorted descending by count; setting sort=False preserves the original bin order.

(-0.101, 33.333]    310
(33.333, 66.667]    340
(66.667, 100.0]     350
Name: score, dtype: int64

Custom bin edges can be provided to match the earlier examples:

df['score'].value_counts(bins=[0,50,80,100], sort=False)

(-0.001, 50.0]    488
(50.0, 80.0]      310
(80.0, 100.0]     202
Name: score, dtype: int64

This yields the same grade counts as the between and cut methods.

Conclusion: Pandas provides multiple flexible ways to discretize numeric data—using between with loc, cut, qcut, or value_counts —each suited to different scenarios such as a small fixed number of bins, custom bin edges, equal‑frequency bins, or quick binning via value counts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python data preprocessing pandas binning cut qcut value_counts

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.