Why NumPy Arrays Outperform Python Lists: Memory Model, Strides, and Views Explained
This article explores NumPy arrays' internal memory layout, data structures, and design choices—covering contiguous storage, strides, C/F contiguous layouts, views versus copies, and powerful indexing and slicing techniques—to reveal why they are dramatically faster than Python lists.
2. NumPy Array Basics
If you ask a Python developer why NumPy arrays are faster than Python lists, they often say it’s because NumPy is implemented in C. While true, the real reason is that NumPy stores homogeneous data in a contiguous memory block, enabling O(1) element access.
2.1 Array Data Structure
Assume an array of eight elements, each 8 bytes (e.g., int64). The array occupies a single continuous block of memory, so the address of element i is calculated as
base_address + i * element_size, giving constant‑time access.
This operation has O(1) time complexity, independent of array size.
If you’re preparing for an interview, this is a common question. 😅
2.2 Python Lists?
Python lists are also arrays, but they store pointers to objects rather than the raw data. This allows elements of different types, but each pointer has the same size, so the list is essentially an array of object references.
# homogeneous list of integers
my_list = [0, 1, 2, 3, 4, 5, 6]
# heterogeneous list containing various types
my_fancy_list = ["Sascha", True, 42, 3.1415, None, False, lambda x: x**2]Because the list stores pointers, accessing an element still costs O(1), but a second indirection is required to retrieve the actual object.
All pointers have the same size and are stored contiguously, so list element access remains O(1). However, the actual objects may reside anywhere in memory, making cache utilization poorer than for NumPy arrays.
The advantage of contiguous storage lies in CPU cache behavior and vectorization.
2.3 Why Are Arrays Faster Than Python Lists?
CPU caches hold recently accessed memory lines (typically 32 bytes). When data is stored contiguously, a cache line brings many successive elements into the cache, enabling rapid subsequent accesses. Python list elements are scattered, causing many cache misses.
Thus, iterating over a NumPy array benefits from high cache‑hit rates, while a Python list suffers from frequent cache misses, explaining the performance gap.
3. NumPy ndarray Object
The core of NumPy is the
ndarray, a homogeneous N‑dimensional array that stores data in a contiguous buffer and provides rich metadata.
3.1 What Is an ndarray?
An
ndarrayis a multi‑dimensional (N‑D) homogeneous array where each element has a fixed size (e.g., int64, float32). The array’s metadata includes
shape,
strides,
itemsize, and
data, which are crucial for performance.
Consider a simple one‑dimensional array:
The array
xcontains eight
int64elements (itemsize = 8 bytes) with shape
(8,). Elements are stored in consecutive memory locations, and the
stridesattribute tells how many bytes to skip to move to the next element along each axis.
The
dataattribute returns a memory‑view object pointing to the start of the buffer.
3.2 Multi‑dimensional Arrays, Strides, and Shape
Multi‑dimensional data is also stored in a single contiguous block; the shape and strides metadata tell NumPy how to interpret the linear memory as an N‑dimensional array.
Observe how strides and shape change when moving from 1‑D to 2‑D and 3‑D arrays.
Notice that the memory layout is identical across dimensions.
The lowest‑indexed axis has the largest stride; the highest‑indexed axis has the smallest stride, reflecting NumPy’s default C‑contiguous (row‑major) layout.
NumPy supports both C‑contiguous and Fortran‑contiguous (F‑contiguous) memory layouts.
3.3 C‑contiguous vs. F‑contiguous
In a C‑contiguous array the last axis varies fastest; in an F‑contiguous array the first axis varies fastest. Transposing a C‑contiguous array yields an F‑contiguous view.
NumPy tracks layout with the
flagsattribute, indicating whether an array is C‑contiguous, F‑contiguous, or neither.
import numpy as np
x = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]])
print(x.flags)
# C_CONTIGUOUS : True <---
# F_CONTIGUOUS : False <---
# OWNDATA : True
y = x.transpose()
print(y.flags)
# C_CONTIGUOUS : False <---
# F_CONTIGUOUS : True <---
# OWNDATA : False
print(y.strides)
# (8, 24)
print(y.base)
# array([[ 0, 1, 2],
# [ 3, 4, 5],
# [ 6, 7, 8],
# [ 9, 10, 11]])
print(id(y.base) == id(x))
# True
# --------------------------
# ---- 创建副本 ----
# --------------------------
y_copy = x.transpose().copy()
print(y_copy.flags)
# C_CONTIGUOUS : True
# F_CONTIGUOUS : False <---
# OWNDATA : True <---
print(y_copy.strides)
# (32, 8)
print(y_copy.base is None)
# True3.4 Why Contiguous Memory Matters
When data is contiguous, the CPU can load whole cache lines that contain many successive elements, reducing memory‑bandwidth pressure. Many NumPy functions (e.g.,
np.min,
np.max,
np.sum) assume C‑contiguous layout for optimal performance.
Functions like
np.raveland
np.reshapetry to return a view of the original data; if the layout is not contiguous, they fall back to creating a copy.
4. Views and Copies
NumPy distinguishes between views and copies. A view shares the same data buffer as the original array and only modifies metadata; changes to a view affect the original array.
A copy allocates new memory and copies the data, so modifications are independent but cost more memory.
NumPy’s
baseattribute tells whether an array is a view (
basepoints to the original) or a copy (
baseis
None). The
OWNDATAflag indicates ownership of the data buffer.
5. Indexing and Slicing
NumPy’s powerful indexing and slicing mechanisms let you retrieve single elements, sub‑arrays, or arbitrarily complex selections.
5.1 Basic Indexing
Retrieving a single element is an O(1) operation: the address is computed as
base + index * itemsize.
5.2 Slicing
Slicing returns a sub‑array. For a 1‑D slice, the start index is offset by
start * itemsize, and the step determines how many bytes to skip between elements.
In higher dimensions, strides for each axis are used to compute addresses.
5.3 Advanced Indexing
Advanced indexing uses integer or boolean arrays to select arbitrary elements, producing a copy rather than a view.
import numpy as np
x = np.array([0,1,2,3,4,5,6,7])
# standard slicing
y1 = x[::2]
# advanced indexing with integer sequence
y2 = x[[0,2,4,6]]
y2 = x[np.array([0,2,4,6]])
# boolean indexing
y3 = x[[True, False, True, False, True, False, True, False]]
y3 = x[x % 2 == 0]
print(y1) # [0 2 4 6]
print(y2) # [0 2 4 6]
print(y3) # [0 2 4 6]
print(y1.flags)
# C_CONTIGUOUS: False
# OWNDATA: False
print(y2.flags)
# C_CONTIGUOUS: True
# OWNDATA: True
print(y3.flags)
# C_CONTIGUOUS: True
# OWNDATA: TrueStandard slicing returns a view; advanced indexing returns a C‑contiguous copy.
6. Summary of Part One
In this first part we introduced NumPy arrays, explained their contiguous memory layout, and showed why they outperform Python lists. We examined the
ndarraymetadata (
shape,
strides,
itemsize,
data), the distinction between views and copies, C‑ vs. F‑contiguous layouts, and the powerful indexing and slicing capabilities. The next part will dive deeper into NumPy’s internal mechanisms for avoiding copies and saving memory.
Code Mala Tang
Read source code together, write articles together, and enjoy spicy hot pot together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.