Fundamentals 18 min read

Simple Techniques to Speed Up Python For Loops by 1.3× to 970×

This article presents a collection of straightforward Python techniques—such as list comprehensions, external length calculation, set usage, early‑exit loops, inlining functions, pre‑computations, generators, map(), memoization, NumPy vectorization, filterfalse, and join()—that can accelerate for‑loops anywhere from 1.3‑fold up to nearly a thousand‑fold, with explanations and benchmark results.

Top Architecture Tech Stack

Feb 5, 2024

Simple Techniques to Speed Up Python For Loops by 1.3× to 970×

In this article I introduce several simple methods that can increase the speed of Python for‑loops by a factor of 1.3 to 900×. The built‑in timeit module is used throughout to measure the current performance and the improvements.

Simple Methods

1. List Comprehension

# Baseline version (Inefficient way)
# Calculating the power of numbers
# Without using List Comprehension
def test_01_v0(numbers):
    output = []
    for n in numbers:
        output.append(n ** 2.5)
    return output

# Improved version (Using List Comprehension)
def test_01_v1(numbers):
    output = [n ** 2.5 for n in numbers]
    return output

Result:

# Summary Of Test Results
      Baseline: 32.158 ns per loop
      Improved: 16.040 ns per loop
% Improvement: 50.1 %
      Speedup: 2.00x

Using a list comprehension yields a 2× speed‑up.

2. Compute Length Outside the Loop

If you need to iterate based on the length of a list, calculate the length before the loop.

# Baseline version (Inefficient way)
# (Length calculation inside for loop)
def test_02_v0(numbers):
    output_list = []
    for i in range(len(numbers)):
        output_list.append(i * 2)
    return output_list

# Improved version (Length calculation outside for loop)
def test_02_v1(numbers):
    my_list_length = len(numbers)
    output_list = []
    for i in range(my_list_length):
        output_list.append(i * 2)
    return output_list

Result:

# Summary Of Test Results
      Baseline: 112.135 ns per loop
      Improved: 68.304 ns per loop
% Improvement: 39.1 %
      Speedup: 1.64x

Moving the length calculation out of the loop speeds up execution by about 1.6×.

3. Use Set for Membership Tests

When performing comparisons inside nested loops, replace them with set operations.

# Baseline version (Inefficient way)
# (nested lookups using for loop)
def test_03_v0(list_1, list_2):
    common_items = []
    for item in list_1:
        if item in list_2:
            common_items.append(item)
    return common_items

# Improved version (sets to replace nested lookups)
def test_03_v1(list_1, list_2):
    s_1 = set(list_1)
    s_2 = set(list_2)
    output_list = []
    common_items = s_1.intersection(s_2)
    return common_items

Result:

# Summary Of Test Results
      Baseline: 9047.078 ns per loop
      Improved:   18.161 ns per loop
% Improvement: 99.8 %
      Speedup: 498.17x

Using sets for nested lookups yields a 498× speed‑up.

4. Skip Irrelevant Iterations

Avoid redundant calculations by exiting early when possible.

# Example of inefficient code used to find the first even square in a list of numbers
def function_do_something(numbers):
    for n in numbers:
        square = n * n
        if square % 2 == 0:
            return square
    return None  # No even square found

# Example of improved code that finds result without redundant computations
def function_do_something_v1(numbers):
    even_numbers = [i for i in numbers if i%2==0]
    for n in even_numbers:
        square = n * n
        return square
    return None  # No even square found

Result:

# Summary Of Test Results
      Baseline: 16.912 ns per loop
      Improved: 8.697 ns per loop
% Improvement: 48.6 %
      Speedup: 1.94x

This method can roughly double speed, depending on the specific loop logic.

5. Inline Simple Functions

Inlining the body of a small function directly into the loop removes function‑call overhead.

# Example of inefficient code
# Loop that calls the is_prime function n times.
def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

def test_05_v0(n):
    # Baseline version (Inefficient way)
    # (calls the is_prime function n times)
    count = 0
    for i in range(2, n + 1):
        if is_prime(i):
            count += 1
    return count

# Improved version (inlines the logic of the is_prime function)
def test_05_v1(n):
    count = 0
    for i in range(2, n + 1):
        if i <= 1:
            continue
        for j in range(2, int(i**0.5) + 1):
            if i % j == 0:
                break
        else:
            count += 1
    return count

Result:

# Summary Of Test Results
      Baseline: 1271.188 ns per loop
      Improved: 939.603 ns per loop
% Improvement: 26.1 %
      Speedup: 1.35x

Function calls involve stack push/pop, lookup, and argument passing; inlining eliminates this overhead.

6. Avoid Repeated Calculations

Pre‑compute values that are used repeatedly inside nested loops.

def test_07_v0(n):
    # Example of inefficient code
    # Repetitive calculation within nested loop
    result = 0
    for i in range(n):
        for j in range(n):
            result += i * j
    return result

def test_07_v1(n):
    # Example of improved code
    # Utilize precomputed values to help speedup
    pv = [[i * j for j in range(n)] for i in range(n)]
    result = 0
    for i in range(n):
        result += sum(pv[i][:i+1])
    return result

Result:

# Summary Of Test Results
      Baseline: 139.146 ns per loop
      Improved: 92.325 ns per loop
% Improvement: 33.6 %
      Speedup: 1.51x

7. Use Generators

Generators provide lazy evaluation, reducing memory usage and often improving speed for large data sets.

def test_08_v0(n):
    # Baseline version (Inefficient way)
    # (Inefficiently calculates the nth Fibonacci number using a list)
    if n <= 1:
        return n
    f_list = [0, 1]
    for i in range(2, n + 1):
        f_list.append(f_list[i - 1] + f_list[i - 2])
    return f_list[n]

# Improved version (Efficiently calculates the nth Fibonacci number using a generator)
def test_08_v1(n):
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b

Result:

# Summary Of Test Results
      Baseline: 0.083 ns per loop
      Improved: 0.004 ns per loop
% Improvement: 95.5 %
      Speedup: 22.06x

8. Use the map() Function

The built‑in map() function is implemented in C and is highly optimized, allowing you to replace explicit for‑loops.

def some_function_X(x):
    # This would normally be a function containing application logic
    # (for the purpose of this test, just calculate and return the square)
    return x**2

def test_09_v0(numbers):
    # Baseline version (Inefficient way)
    output = []
    for i in numbers:
        output.append(some_function_X(i))
    return output

def test_09_v1(numbers):
    # Improved version (Using Python's built-in map() function)
    output = map(some_function_X, numbers)
    return output

Result:

# Summary Of Test Results
      Baseline: 4.402 ns per loop
      Improved: 0.005 ns per loop
% Improvement: 99.9 %
      Speedup: 970.69x

map()

is written in C and its internal loop is far more efficient than a pure Python for‑loop.

9. Use Memoization (functools.lru_cache)

Memoization caches expensive function results; the built‑in functools.lru_cache decorator provides this capability.

# Example of inefficient code
def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    return fibonacci(n - 1) + fibonacci(n-2)

def test_10_v0(numbers):
    output = []
    for i in numbers:
        output.append(fibonacci(i))
    return output

# Example of efficient code (Using Python's functools' lru_cache function)
import functools

@functools.lru_cache()
def fibonacci_v2(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    return fibonacci_v2(n - 1) + fibonacci_v2(n-2)

def _test_10_v1(numbers):
    output = []
    for i in numbers:
        output.append(fibonacci_v2(i))
    return output

Result:

# Summary Of Test Results
      Baseline: 63.664 ns per loop
      Improved: 1.104 ns per loop
% Improvement: 98.3 %
      Speedup: 57.69x

The lru_cache decorator stores recent call results; with maxsize=None the cache can grow without bound, providing a classic space‑for‑time trade‑off.

10. Vectorization with NumPy

import numpy as np

def test_11_v0(n):
    # Baseline version (Inefficient way of summing numbers in a range)
    output = 0
    for i in range(0, n):
        output = output + i
    return output

def test_11_v1(n):
    # Improved version (Efficient way of summing numbers in a range)
    output = np.sum(np.arange(n))
    return output

Vectorized NumPy operations are commonly used in machine‑learning data processing and give a 28× speed‑up.

11. Avoid Creating Intermediate Lists (filterfalse)

Using itertools.filterfalse avoids building temporary lists, reducing memory usage.

# Baseline version (Inefficient way)
def test_12_v0(numbers):
    filtered_data = []
    for i in numbers:
        filtered_data.extend(list(
            filter(lambda x: x % 5 == 0,
                   range(1, i**2))))
    return filtered_data

# Improved version (using filterfalse)
from itertools import filterfalse

def test_12_v1(numbers):
    filtered_data = []
    for i in numbers:
        filtered_data.extend(list(
            filterfalse(lambda x: x % 5 != 0,
                        range(1, i**2))))
    return filtered_data

Result:

# Summary Of Test Results
      Baseline: 333167.790 ns per loop
      Improved: 2541.850 ns per loop
% Improvement: 99.2 %
      Speedup: 131.07x

12. Efficient String Concatenation

Using the + operator for string concatenation has O(n²) complexity; "".join() reduces it to O(n).

# Baseline version (Inefficient way)
# (concatenation using the += operator)
def test_13_v0(l_strings):
    output = ""
    for a_str in l_strings:
        output += a_str
    return output

# Improved version (using join)
def test_13_v1(l_strings):
    output_list = []
    for a_str in l_strings:
        output_list.append(a_str)
    return "".join(output_list)

# Helper function to generate a large list of fake names
from faker import Faker

def generate_fake_names(count: int = 10000):
    fake = Faker()
    output_list = []
    for _ in range(count):
        output_list.append(fake.name())
    return output_list

l_strings = generate_fake_names(count=50000)

Result:

# Summary Of Test Results
      Baseline: 32.423 ns per loop
      Improved: 21.051 ns per loop
% Improvement: 35.1 %
      Speedup: 1.54x

Summary

Using Python's built‑in map() function instead of an explicit for‑loop speeds up execution by 970×.

Replacing nested for‑loops with set operations speeds up execution by 498×.

Using itertools.filterfalse avoids intermediate lists and speeds up execution by 131×.

Applying functools.lru_cache for memoization speeds up execution by 57×.

Follow the public account and reply with "pycharm" to receive a PyCharm activation code.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Python memoization NumPy Generators list-comprehension Loop Optimization String concatenation

Written by

Top Architecture Tech Stack

Sharing Java and Python tech insights, with occasional practical development tool tips.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.