Fundamentals 7 min read

Understanding Python Garbage Collection: Reference Counting, Mark‑Sweep, and Generational GC

This article explains how Python’s automatic garbage collection works, covering reference counting, the problems of cyclic references, the mark‑and‑sweep algorithm, generational collection, default thresholds, and when and how to manually control the collector with code examples.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Understanding Python Garbage Collection: Reference Counting, Mark‑Sweep, and Generational GC

In computer science, garbage collection (GC) is an automatic memory‑management mechanism that frees memory when it is no longer needed, reducing programmer burden and error risk.

Python programmers rarely need to worry about memory management because the CPython interpreter implements its own GC. In most cases, the interpreter’s built‑in reference‑counting and supplemental algorithms handle memory automatically.

Reference counting increments an object’s count when a new reference is created and decrements it when a reference is destroyed; when the count reaches zero, the memory is released immediately. This provides good real‑time behavior but cannot reclaim objects involved in cyclic references.

To break cycles, CPython uses a mark‑and‑sweep phase that temporarily copies reference counts, marks reachable objects, and sweeps away the rest, avoiding changes to the original counts.

CPython also employs generational collection , dividing objects into three generations (0, 1, 2). New objects start in generation 0; surviving objects are promoted to older generations, which are collected less frequently. The default thresholds are shown by gc.get_threshold() returning (700, 10, 10) , and the collector runs when allocation‑minus‑deallocation exceeds these values.

While the default generational settings work for most programs, you can inspect them with:

<code>import gc</code>
<code>gc.get_threshold()</code>

Python also allows explicit collection via gc.collect() , but it is rarely needed. In performance‑critical sections that create many temporary objects, you may temporarily disable GC to avoid pause‑induced latency:

<code>gc.disable()</code>
<code># do somethings</code>
<code>gc.enable()</code>

Some third‑party libraries may re‑enable GC automatically, so using gc.set_threshold(0) can be a more reliable way to keep GC disabled until you explicitly re‑enable it.

Memory ManagementPythonGarbage CollectionReference CountingGenerational GCmark-sweep
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.