Fundamentals 13 min read

Simple Techniques to Accelerate Python For‑Loops: From 1.3× to 970× Speed‑ups

This article presents a collection of practical Python tricks—such as list comprehensions, pre‑computing lengths, using sets, skipping irrelevant iterations, inlining functions, generators, map, memoization, vectorization, filterfalse, and join—to dramatically improve for‑loop performance, with benchmark results ranging from modest 1.3× gains up to a staggering 970× acceleration.

Python Programming Learning Circle

Apr 29, 2025

Simple Techniques to Accelerate Python For‑Loops: From 1.3× to 970× Speed‑ups

The article demonstrates how to use Python's timeit module to benchmark loop performance and then applies a series of optimizations that can increase execution speed anywhere from 1.3× to 970×.

Simple Methods

1. List Comprehension

# Baseline version (Inefficient way)
# Calculating the power of numbers
# Without using List Comprehension
def test_01_v0(numbers):
    output = []
    for n in numbers:
        output.append(n**2.5)
    return output

# Improved version (Using List Comprehension)
def test_01_v1(numbers):
    output = [n**2.5 for n in numbers]
    return output

Result: 2× speed‑up.

2. Compute Length Outside the Loop

# Baseline version (Length calculation inside for loop)
def test_02_v0(numbers):
    output_list = []
    for i in range(len(numbers)):
        output_list.append(i*2)
    return output_list

# Improved version (Length calculation outside for loop)
def test_02_v1(numbers):
    my_list_length = len(numbers)
    output_list = []
    for i in range(my_list_length):
        output_list.append(i*2)
    return output_list

Result: 1.6× speed‑up.

3. Use Set for Membership Tests

# Baseline version (nested lookups using for loop)
def test_03_v0(list_1, list_2):
    common_items = []
    for item in list_1:
        if item in list_2:
            common_items.append(item)
    return common_items

# Improved version (sets to replace nested lookups)
def test_03_v1(list_1, list_2):
    s_1 = set(list_1)
    s_2 = set(list_2)
    common_items = s_1.intersection(s_2)
    return common_items

Result: 498× speed‑up for nested loops.

4. Skip Irrelevant Iterations

# Inefficient version

def function_do_something(numbers):
    for n in numbers:
        square = n*n
        if square % 2 == 0:
            return square
    return None

# Improved version

def function_do_something_v1(numbers):
    even_numbers = [i for i in numbers if i % 2 == 0]
    for n in even_numbers:
        square = n*n
        return square
    return None

Result: ~1.94× speed‑up.

5. Code Merging (Inlining Functions)

# Baseline version calling is_prime inside a loop

def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n**0.5)+1):
        if n % i == 0:
            return False
    return True

def test_05_v0(n):
    count = 0
    for i in range(2, n+1):
        if is_prime(i):
            count += 1
    return count

# Improved version with inlined logic

def test_05_v1(n):
    count = 0
    for i in range(2, n+1):
        if i <= 1:
            continue
        for j in range(2, int(i**0.5)+1):
            if i % j == 0:
                break
        else:
            count += 1
    return count

Result: ~1.35× speed‑up.

6. Generators

# Baseline (list‑based Fibonacci)
def test_08_v0(n):
    if n <= 1:
        return n
    f_list = [0, 1]
    for i in range(2, n+1):
        f_list.append(f_list[i-1] + f_list[i-2])
    return f_list[n]

# Improved (generator‑based)
def test_08_v1(n):
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b

Result: ~22× speed‑up.

7. map() Function

# Baseline using explicit loop

def some_function_X(x):
    return x**2

def test_09_v0(numbers):
    output = []
    for i in numbers:
        output.append(some_function_X(i))
    return output

# Improved using map

def test_09_v1(numbers):
    output = map(some_function_X, numbers)
    return output

Result: ~970× speed‑up because map is a C‑implemented iterator.

8. Memoization with lru_cache

import functools

@functools.lru_cache()
def fibonacci_v2(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    return fibonacci_v2(n-1) + fibonacci_v2(n-2)

def _test_10_v1(numbers):
    output = []
    for i in numbers:
        output.append(fibonacci_v2(i))
    return output

Result: ~58× speed‑up.

9. Vectorization with NumPy

import numpy as np

def test_11_v0(n):
    output = 0
    for i in range(n):
        output += i
    return output

def test_11_v1(n):
    output = np.sum(np.arange(n))
    return output

Result: ~28× speed‑up.

10. Avoid Creating Intermediate Lists (filterfalse)

# Baseline using filter + list conversion

def test_12_v0(numbers):
    filtered_data = []
    for i in numbers:
        filtered_data.extend(list(filter(lambda x: x % 5 == 0, range(1, i**2))))
    return filtered_data

# Improved using itertools.filterfalse
from itertools import filterfalse

def test_12_v1(numbers):
    filtered_data = []
    for i in numbers:
        filtered_data.extend(list(filterfalse(lambda x: x % 5 != 0, range(1, i**2))))
    return filtered_data

Result: ~131× speed‑up and lower memory usage.

11. Efficient String Concatenation

# Baseline using +=
def test_13_v0(l_strings):
    output = ""
    for a_str in l_strings:
        output += a_str
    return output

# Improved using list + join
def test_13_v1(l_strings):
    output_list = []
    for a_str in l_strings:
        output_list.append(a_str)
    return "".join(output_list)

Result: ~1.5× speed‑up because join runs in O(n) time versus O(n²) for repeated +=.

Conclusion

The article summarizes eleven practical techniques that collectively can boost Python for‑loop performance from modest 1.3× improvements to extreme 970× accelerations, emphasizing the trade‑off between readability and raw speed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Code Optimization benchmark Loop Optimization

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.