Fundamentals 13 min read

Why Python Lists Aren’t Pre‑Allocated: Inside CPython’s Memory Management

Python’s list type appears flexible, but its underlying CPython implementation uses a dynamic resizing strategy without a pre‑allocated memory pool, allocating memory on demand via list_resize and PyMem_RESIZE, which this article explains through code analysis, memory size measurements, and practical recommendations.

Efficient Ops
Efficient Ops
Efficient Ops
Why Python Lists Aren’t Pre‑Allocated: Inside CPython’s Memory Management

Introduction

Python's list is a highly flexible array that can change length at will. Because of this convenience we often modify the list using

insert

,

pop

, or the more common

append

operations.

Example usage:

<code>>> test = []
>>> test.append(1)
>>> test.append({2})
>>> test.append([3])
>>> print test
# output
[1, set([2]), [3]]</code>

Another example:

<code>test = []
for i in range(4):
    test.append(i)
print test
# output
[0, 1, 2, 3]</code>

While this feels pleasant, any scenario that dynamically changes the length of a list should immediately raise concerns about memory management.

Stingy Initialization

Inspired by pre‑allocation theory, one might assume that a list reserves a certain amount of memory on creation. In reality, Python lists are "low" on pre‑allocation.

Memory size experiment:

<code>import sys
test = []
test_1 = [1]
print sys.getsizeof(test)
print sys.getsizeof(test_1) - sys.getsizeof(test)
# output
72   # size of empty list (total object size)
8    # size added by one element (pointer size)</code>

The CPython source shows that

PyList_New

allocates memory only for the requested number of elements, without a hidden pre‑allocation pool:

<code>PyObject *
PyList_New(Py_ssize_t size)
{
    PyListObject *op;
    size_t nbytes;
    if (size < 0) {
        PyErr_BadInternalCall();
        return NULL;
    }
    if ((size_t)size > PY_SIZE_MAX / sizeof(PyObject *))
        return PyErr_NoMemory();
    if (numfree) {
        numfree--;
        op = free_list[numfree];
        _Py_NewReference((PyObject *)op);
    } else {
        op = PyObject_GC_New(PyListObject, &PyList_Type);
        if (op == NULL)
            return NULL;
    }
    nbytes = size * sizeof(PyObject *);
    if (size <= 0)
        op->ob_item = NULL;
    else {
        op->ob_item = (PyObject **) PyMem_MALLOC(nbytes);
        if (op->ob_item == NULL) {
            Py_DECREF(op);
            return PyErr_NoMemory();
        }
        memset(op->ob_item, 0, nbytes);
    }
    Py_SIZE(op) = size;
    op->allocated = size;
    _PyObject_GC_TRACK(op);
    return (PyObject *) op;
}</code>

When executing

test = [1]

, two actions occur: (1) build an empty list of the appropriate length, and (2) fill it with the element using

PyList_SET_ITEM

:

<code>#define PyList_SET_ITEM(op, i, v) (((PyListObject *)(op))->ob_item[i] = (v))</code>

Thus, list initialization does not involve a pre‑allocated memory pool; memory is allocated on demand.

Variable‑Length Mechanics

Every mutation—

insert

,

pop

, or

append

—invokes

list_resize

, which adjusts the list's memory footprint.

<code>static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
    PyObject **items;
    size_t new_allocated;
    Py_ssize_t allocated = self->allocated;
    if (allocated >= newsize && newsize >= (allocated >> 1)) {
        assert(self->ob_item != NULL || newsize == 0);
        Py_SIZE(self) = newsize;
        return 0;
    }
    new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
    if (new_allocated > PY_SIZE_MAX - newsize) {
        PyErr_NoMemory();
        return -1;
    } else {
        new_allocated += newsize;
    }
    if (newsize == 0)
        new_allocated = 0;
    items = self->ob_item;
    if (new_allocated <= (PY_SIZE_MAX / sizeof(PyObject *)))
        PyMem_RESIZE(items, PyObject *, new_allocated);
    else
        items = NULL;
    if (items == NULL) {
        PyErr_NoMemory();
        return -1;
    }
    self->ob_item = items;
    Py_SIZE(self) = newsize;
    self->allocated = new_allocated;
    return 0;
}</code>

In this function,

newsize

represents the total number of elements after the operation, while

new_allocated

is the new total capacity ("holes"). The list object stores both

ob_size

(actual element count) and

allocated

(capacity), with the invariant

0 <= ob_size <= allocated

.

When

list_resize

runs, the new capacity is calculated as:

Determine a base:

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6)

;

Check whether

new_allocated + newsize

exceeds

PY_SIZE_MAX

and raise an error if so;

Set the final capacity to

new_allocated + newsize

(or 0 if

newsize

is 0).

Empirical test of memory growth during successive appends:

<code>#coding: utf8
import sys

test = []
raw_size = sys.getsizeof(test)

for i in range(1, 7):
    test.append(1)
    print "%d 次 append 减去空列表的内存大小:%s " % (i, sys.getsizeof(test) - raw_size)

# output
1 次 append 减去空列表的内存大小:32
2 次 append 减去空列表的内存大小:32
3 次 append 减去空列表的内存大小:32
4 次 append 减去空列表的内存大小:32
5 次 append 减去空列表的内存大小:64
6 次 append 减去空列表的内存大小:64</code>

The table below shows when the internal

allocated

value changes during appends:

Append #

Original length

Added elements

Original allocated

newsize

new_allocated

1

0

1

0

0+1=1

3+1=4

2

1

1

4

1+1=2

No change

3

2

1

4

2+1=3

No change

4

3

1

4

3+1=4

No change

5

4

1

4

4+1=5

3+5=8

6

5

1

8

5+1=6

No change

Understanding when

allocated

grows explains why memory usage can jump after certain numbers of appends.

Best Practice: List Comprehensions

When the goal is simply to build a list and the loop body is straightforward, prefer a list comprehension over an explicit

for

loop with

append

:

<code>test = [i for i in xrange(4)]</code>

List comprehensions allocate the exact required size up front, avoiding repeated calls to

PyMem_RESIZE

and reducing bytecode overhead such as

SETUP_LOOP

and

CALL_FUNCTION

. Use them only when the logic is simple and the result is needed as a list.

Remember: choose the right tool for the job; indiscriminate use of list comprehensions can lead to less readable code.

PerformanceMemory ManagementPythonlistCPythonlist_resize
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.