Why Python Lists Aren’t Pre‑Allocated: Inside CPython’s Memory Management
Python’s list type appears flexible, but its underlying CPython implementation uses a dynamic resizing strategy without a pre‑allocated memory pool, allocating memory on demand via list_resize and PyMem_RESIZE, which this article explains through code analysis, memory size measurements, and practical recommendations.
Introduction
Python's list is a highly flexible array that can change length at will. Because of this convenience we often modify the list using
insert,
pop, or the more common
appendoperations.
Example usage:
<code>>> test = []
>>> test.append(1)
>>> test.append({2})
>>> test.append([3])
>>> print test
# output
[1, set([2]), [3]]</code>Another example:
<code>test = []
for i in range(4):
test.append(i)
print test
# output
[0, 1, 2, 3]</code>While this feels pleasant, any scenario that dynamically changes the length of a list should immediately raise concerns about memory management.
Stingy Initialization
Inspired by pre‑allocation theory, one might assume that a list reserves a certain amount of memory on creation. In reality, Python lists are "low" on pre‑allocation.
Memory size experiment:
<code>import sys
test = []
test_1 = [1]
print sys.getsizeof(test)
print sys.getsizeof(test_1) - sys.getsizeof(test)
# output
72 # size of empty list (total object size)
8 # size added by one element (pointer size)</code>The CPython source shows that
PyList_Newallocates memory only for the requested number of elements, without a hidden pre‑allocation pool:
<code>PyObject *
PyList_New(Py_ssize_t size)
{
PyListObject *op;
size_t nbytes;
if (size < 0) {
PyErr_BadInternalCall();
return NULL;
}
if ((size_t)size > PY_SIZE_MAX / sizeof(PyObject *))
return PyErr_NoMemory();
if (numfree) {
numfree--;
op = free_list[numfree];
_Py_NewReference((PyObject *)op);
} else {
op = PyObject_GC_New(PyListObject, &PyList_Type);
if (op == NULL)
return NULL;
}
nbytes = size * sizeof(PyObject *);
if (size <= 0)
op->ob_item = NULL;
else {
op->ob_item = (PyObject **) PyMem_MALLOC(nbytes);
if (op->ob_item == NULL) {
Py_DECREF(op);
return PyErr_NoMemory();
}
memset(op->ob_item, 0, nbytes);
}
Py_SIZE(op) = size;
op->allocated = size;
_PyObject_GC_TRACK(op);
return (PyObject *) op;
}</code>When executing
test = [1], two actions occur: (1) build an empty list of the appropriate length, and (2) fill it with the element using
PyList_SET_ITEM:
<code>#define PyList_SET_ITEM(op, i, v) (((PyListObject *)(op))->ob_item[i] = (v))</code>Thus, list initialization does not involve a pre‑allocated memory pool; memory is allocated on demand.
Variable‑Length Mechanics
Every mutation—
insert,
pop, or
append—invokes
list_resize, which adjusts the list's memory footprint.
<code>static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
PyObject **items;
size_t new_allocated;
Py_ssize_t allocated = self->allocated;
if (allocated >= newsize && newsize >= (allocated >> 1)) {
assert(self->ob_item != NULL || newsize == 0);
Py_SIZE(self) = newsize;
return 0;
}
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
if (new_allocated > PY_SIZE_MAX - newsize) {
PyErr_NoMemory();
return -1;
} else {
new_allocated += newsize;
}
if (newsize == 0)
new_allocated = 0;
items = self->ob_item;
if (new_allocated <= (PY_SIZE_MAX / sizeof(PyObject *)))
PyMem_RESIZE(items, PyObject *, new_allocated);
else
items = NULL;
if (items == NULL) {
PyErr_NoMemory();
return -1;
}
self->ob_item = items;
Py_SIZE(self) = newsize;
self->allocated = new_allocated;
return 0;
}</code>In this function,
newsizerepresents the total number of elements after the operation, while
new_allocatedis the new total capacity ("holes"). The list object stores both
ob_size(actual element count) and
allocated(capacity), with the invariant
0 <= ob_size <= allocated.
When
list_resizeruns, the new capacity is calculated as:
Determine a base:
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
Check whether
new_allocated + newsizeexceeds
PY_SIZE_MAXand raise an error if so;
Set the final capacity to
new_allocated + newsize(or 0 if
newsizeis 0).
Empirical test of memory growth during successive appends:
<code>#coding: utf8
import sys
test = []
raw_size = sys.getsizeof(test)
for i in range(1, 7):
test.append(1)
print "%d 次 append 减去空列表的内存大小:%s " % (i, sys.getsizeof(test) - raw_size)
# output
1 次 append 减去空列表的内存大小:32
2 次 append 减去空列表的内存大小:32
3 次 append 减去空列表的内存大小:32
4 次 append 减去空列表的内存大小:32
5 次 append 减去空列表的内存大小:64
6 次 append 减去空列表的内存大小:64</code>The table below shows when the internal
allocatedvalue changes during appends:
Append #
Original length
Added elements
Original allocated
newsize
new_allocated
1
0
1
0
0+1=1
3+1=4
2
1
1
4
1+1=2
No change
3
2
1
4
2+1=3
No change
4
3
1
4
3+1=4
No change
5
4
1
4
4+1=5
3+5=8
6
5
1
8
5+1=6
No change
Understanding when
allocatedgrows explains why memory usage can jump after certain numbers of appends.
Best Practice: List Comprehensions
When the goal is simply to build a list and the loop body is straightforward, prefer a list comprehension over an explicit
forloop with
append:
<code>test = [i for i in xrange(4)]</code>List comprehensions allocate the exact required size up front, avoiding repeated calls to
PyMem_RESIZEand reducing bytecode overhead such as
SETUP_LOOPand
CALL_FUNCTION. Use them only when the logic is simple and the result is needed as a list.
Remember: choose the right tool for the job; indiscriminate use of list comprehensions can lead to less readable code.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.