Fundamentals 19 min read

Understanding NumPy Array Memory Layout and Accelerating Image Resizing with Unsafe Python Techniques

This article explains how NumPy stride differences caused a 100× slowdown when resizing images from a pygame Surface, demonstrates how to reinterpret the underlying memory layout using ctypes to achieve a 100× speedup with OpenCV, and discusses the safety implications of such low‑level Python tricks.

Python Programming Learning Circle

Jul 15, 2024

Understanding NumPy Array Memory Layout and Accelerating Image Resizing with Unsafe Python Techniques

Unsafe Python and Image Resizing

When using pygame to resize images, the author observed a large performance gap between pygame.transform.smoothscale and cv2.resize. The slowdown was traced to NumPy array stride differences caused by the memory layout of the pygame Surface data.

Benchmarking the Original Approaches

Using a 1920×1080 surface, the following benchmark was performed (repeat = 10):

from contextlib import contextmanager
import time
import pygame as pg
import numpy as np
import cv2

@contextmanager
def Timer(name):
    start = time.time()
    yield
    finish = time.time()
    print(f'{name} took {finish-start:.4f} sec')

IW = 1920
IH = 1080
OW = IW // 2
OH = IH // 2
repeat = 10

isurf = pg.Surface((IW,IH), pg.SRCALPHA)
with Timer('pg.Surface with smoothscale'):
    for i in range(repeat):
        pg.transform.smoothscale(isurf, (OW,OH))

def cv2_resize(image):
    return cv2.resize(image, (OH,OW), interpolation=cv2.INTER_AREA)

i1 = np.zeros((IW,IH,3), np.uint8)
with Timer('np.zeros with cv2'):
    for i in range(repeat):
        o1 = cv2_resize(i1)

Output:

pg.Surface with smoothscale took 0.2002 sec
np.zeros with cv2 took 0.0145 sec

When accessing the surface pixels via pygame.surfarray.pixels3d (zero‑copy) and then calling cv2.resize, the runtime increased dramatically:

i2 = pg.surfarray.pixels3d(isurf)
with Timer('pixels3d with cv2'):
    for i in range(repeat):
        o2 = cv2_resize(i2)

pixels3d with cv2 took 1.3625 sec

Why the Stride Difference?

Both arrays have the same .shape and .dtype, but their .strides differ:

print('input strides',i1.strides,i2.strides)
print('output strides',o1.strides,o2.strides)

input strides (3240, 3, 1) (4, 7680, -1)
output strides (1620, 3, 1) (1620, 3, 1)

The negative z‑stride and the extra byte‑step (4 instead of 3) come from the pygame Surface storing pixels as BGRA with an alpha channel, and the base pointer being offset to the red component.

Re‑interpreting the Memory Layout

By obtaining the raw C pointer of the surface and wrapping it in a NumPy array with the default (height‑major) strides, the layout can be treated as a regular RGBA image, effectively transposing the data and swapping R↔B without copying:

import ctypes

def arr_params(surface):
    pixels = pg.surfarray.pixels3d(surface)
    width, height, depth = pixels.shape
    assert depth == 3
    xstride, ystride, zstride = pixels.strides
    oft = 0
    bgr = 0
    if zstride == -1:  # BGR layout
        oft = -2
        zstride = 1
        bgr = 1
    assert xstride == 4
    assert zstride == 1
    assert ystride == width*4
    base = pixels.ctypes.data_as(ctypes.c_void_p)
    ptr = ctypes.c_void_p(base.value + oft)
    return ptr, width, height, bgr

def rgba_buffer(p, w, h):
    type = ctypes.c_uint8 * (w * h * 4)
    return ctypes.cast(p, ctypes.POINTER(type)).contents

def cv2_resize_surface(src, dst):
    iptr, iw, ih, ibgr = arr_params(src)
    optr, ow, oh, obgr = arr_params(dst)
    assert ibgr == obgr
    ibuf = rgba_buffer(iptr, iw, ih)
    iarr = np.ndarray((ih, iw, 4), np.uint8, buffer=ibuf)
    obuf = rgba_buffer(optr, ow, oh)
    oarr = np.ndarray((oh, ow, 4), np.uint8, buffer=obuf)
    cv2.resize(iarr, (ow, oh), oarr, interpolation=cv2.INTER_AREA)

Benchmarking this approach yields a ~100× speedup compared to the original pixels3d path and is only slightly slower than operating on a freshly allocated np.zeros RGBA array.

attached RGBA with cv2 took 0.0097 sec
np.zeros RGBA with cv2 took 0.0066 sec

Safety Considerations

The technique relies on raw C pointers obtained via ctypes. While the surrounding Python code remains safe, the helper functions ( arr_params and rgba_buffer) are inherently unsafe: misuse can lead to crashes or memory corruption, mirroring Rust’s unsafe blocks.

Python itself has no unsafe keyword, but the combination of ctypes and C libraries provides a similar escape hatch for performance‑critical sections.

Conclusion

Understanding NumPy’s stride mechanics and the memory layout of external libraries such as pygame can reveal hidden performance bottlenecks. By re‑interpreting the underlying buffer with correct strides, one can achieve dramatic speed improvements without rewriting the C code, albeit at the cost of introducing unsafe memory handling that must be carefully managed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Python memory layout OpenCV unsafe NumPy

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.