Simple Techniques to Speed Up Python For Loops by 1.3× to 970×
This article presents a collection of straightforward Python techniques—such as list comprehensions, external length calculation, set usage, early‑exit loops, inlining functions, pre‑computations, generators, map(), memoization, NumPy vectorization, filterfalse, and join()—that can accelerate for‑loops anywhere from 1.3‑fold up to nearly a thousand‑fold, with explanations and benchmark results.
In this article I introduce several simple methods that can increase the speed of Python for‑loops by a factor of 1.3 to 900×. The built‑in timeit module is used throughout to measure the current performance and the improvements.
Simple Methods
1. List Comprehension
# Baseline version (Inefficient way)
# Calculating the power of numbers
# Without using List Comprehension
def test_01_v0(numbers):
output = []
for n in numbers:
output.append(n ** 2.5)
return output
# Improved version (Using List Comprehension)
def test_01_v1(numbers):
output = [n ** 2.5 for n in numbers]
return outputResult:
# Summary Of Test Results
Baseline: 32.158 ns per loop
Improved: 16.040 ns per loop
% Improvement: 50.1 %
Speedup: 2.00xUsing a list comprehension yields a 2× speed‑up.
2. Compute Length Outside the Loop
If you need to iterate based on the length of a list, calculate the length before the loop.
# Baseline version (Inefficient way)
# (Length calculation inside for loop)
def test_02_v0(numbers):
output_list = []
for i in range(len(numbers)):
output_list.append(i * 2)
return output_list
# Improved version (Length calculation outside for loop)
def test_02_v1(numbers):
my_list_length = len(numbers)
output_list = []
for i in range(my_list_length):
output_list.append(i * 2)
return output_listResult:
# Summary Of Test Results
Baseline: 112.135 ns per loop
Improved: 68.304 ns per loop
% Improvement: 39.1 %
Speedup: 1.64xMoving the length calculation out of the loop speeds up execution by about 1.6×.
3. Use Set for Membership Tests
When performing comparisons inside nested loops, replace them with set operations.
# Baseline version (Inefficient way)
# (nested lookups using for loop)
def test_03_v0(list_1, list_2):
common_items = []
for item in list_1:
if item in list_2:
common_items.append(item)
return common_items
# Improved version (sets to replace nested lookups)
def test_03_v1(list_1, list_2):
s_1 = set(list_1)
s_2 = set(list_2)
output_list = []
common_items = s_1.intersection(s_2)
return common_itemsResult:
# Summary Of Test Results
Baseline: 9047.078 ns per loop
Improved: 18.161 ns per loop
% Improvement: 99.8 %
Speedup: 498.17xUsing sets for nested lookups yields a 498× speed‑up.
4. Skip Irrelevant Iterations
Avoid redundant calculations by exiting early when possible.
# Example of inefficient code used to find the first even square in a list of numbers
def function_do_something(numbers):
for n in numbers:
square = n * n
if square % 2 == 0:
return square
return None # No even square found
# Example of improved code that finds result without redundant computations
def function_do_something_v1(numbers):
even_numbers = [i for i in numbers if i%2==0]
for n in even_numbers:
square = n * n
return square
return None # No even square foundResult:
# Summary Of Test Results
Baseline: 16.912 ns per loop
Improved: 8.697 ns per loop
% Improvement: 48.6 %
Speedup: 1.94xThis method can roughly double speed, depending on the specific loop logic.
5. Inline Simple Functions
Inlining the body of a small function directly into the loop removes function‑call overhead.
# Example of inefficient code
# Loop that calls the is_prime function n times.
def is_prime(n):
if n <= 1:
return False
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True
def test_05_v0(n):
# Baseline version (Inefficient way)
# (calls the is_prime function n times)
count = 0
for i in range(2, n + 1):
if is_prime(i):
count += 1
return count
# Improved version (inlines the logic of the is_prime function)
def test_05_v1(n):
count = 0
for i in range(2, n + 1):
if i <= 1:
continue
for j in range(2, int(i**0.5) + 1):
if i % j == 0:
break
else:
count += 1
return countResult:
# Summary Of Test Results
Baseline: 1271.188 ns per loop
Improved: 939.603 ns per loop
% Improvement: 26.1 %
Speedup: 1.35xFunction calls involve stack push/pop, lookup, and argument passing; inlining eliminates this overhead.
6. Avoid Repeated Calculations
Pre‑compute values that are used repeatedly inside nested loops.
def test_07_v0(n):
# Example of inefficient code
# Repetitive calculation within nested loop
result = 0
for i in range(n):
for j in range(n):
result += i * j
return result
def test_07_v1(n):
# Example of improved code
# Utilize precomputed values to help speedup
pv = [[i * j for j in range(n)] for i in range(n)]
result = 0
for i in range(n):
result += sum(pv[i][:i+1])
return resultResult:
# Summary Of Test Results
Baseline: 139.146 ns per loop
Improved: 92.325 ns per loop
% Improvement: 33.6 %
Speedup: 1.51x7. Use Generators
Generators provide lazy evaluation, reducing memory usage and often improving speed for large data sets.
def test_08_v0(n):
# Baseline version (Inefficient way)
# (Inefficiently calculates the nth Fibonacci number using a list)
if n <= 1:
return n
f_list = [0, 1]
for i in range(2, n + 1):
f_list.append(f_list[i - 1] + f_list[i - 2])
return f_list[n]
# Improved version (Efficiently calculates the nth Fibonacci number using a generator)
def test_08_v1(n):
a, b = 0, 1
for _ in range(n):
yield a
a, b = b, a + bResult:
# Summary Of Test Results
Baseline: 0.083 ns per loop
Improved: 0.004 ns per loop
% Improvement: 95.5 %
Speedup: 22.06x8. Use the map() Function
The built‑in map() function is implemented in C and is highly optimized, allowing you to replace explicit for‑loops.
def some_function_X(x):
# This would normally be a function containing application logic
# (for the purpose of this test, just calculate and return the square)
return x**2
def test_09_v0(numbers):
# Baseline version (Inefficient way)
output = []
for i in numbers:
output.append(some_function_X(i))
return output
def test_09_v1(numbers):
# Improved version (Using Python's built-in map() function)
output = map(some_function_X, numbers)
return outputResult:
# Summary Of Test Results
Baseline: 4.402 ns per loop
Improved: 0.005 ns per loop
% Improvement: 99.9 %
Speedup: 970.69xmap() is written in C and its internal loop is far more efficient than a pure Python for‑loop.
9. Use Memoization (functools.lru_cache)
Memoization caches expensive function results; the built‑in functools.lru_cache decorator provides this capability.
# Example of inefficient code
def fibonacci(n):
if n == 0:
return 0
elif n == 1:
return 1
return fibonacci(n - 1) + fibonacci(n-2)
def test_10_v0(numbers):
output = []
for i in numbers:
output.append(fibonacci(i))
return output
# Example of efficient code (Using Python's functools' lru_cache function)
import functools
@functools.lru_cache()
def fibonacci_v2(n):
if n == 0:
return 0
elif n == 1:
return 1
return fibonacci_v2(n - 1) + fibonacci_v2(n-2)
def _test_10_v1(numbers):
output = []
for i in numbers:
output.append(fibonacci_v2(i))
return outputResult:
# Summary Of Test Results
Baseline: 63.664 ns per loop
Improved: 1.104 ns per loop
% Improvement: 98.3 %
Speedup: 57.69xThe lru_cache decorator stores recent call results; with maxsize=None the cache can grow without bound, providing a classic space‑for‑time trade‑off.
10. Vectorization with NumPy
import numpy as np
def test_11_v0(n):
# Baseline version (Inefficient way of summing numbers in a range)
output = 0
for i in range(0, n):
output = output + i
return output
def test_11_v1(n):
# Improved version (Efficient way of summing numbers in a range)
output = np.sum(np.arange(n))
return outputVectorized NumPy operations are commonly used in machine‑learning data processing and give a 28× speed‑up.
11. Avoid Creating Intermediate Lists (filterfalse)
Using itertools.filterfalse avoids building temporary lists, reducing memory usage.
# Baseline version (Inefficient way)
def test_12_v0(numbers):
filtered_data = []
for i in numbers:
filtered_data.extend(list(
filter(lambda x: x % 5 == 0,
range(1, i**2))))
return filtered_data
# Improved version (using filterfalse)
from itertools import filterfalse
def test_12_v1(numbers):
filtered_data = []
for i in numbers:
filtered_data.extend(list(
filterfalse(lambda x: x % 5 != 0,
range(1, i**2))))
return filtered_dataResult:
# Summary Of Test Results
Baseline: 333167.790 ns per loop
Improved: 2541.850 ns per loop
% Improvement: 99.2 %
Speedup: 131.07x12. Efficient String Concatenation
Using the + operator for string concatenation has O(n²) complexity; "".join() reduces it to O(n).
# Baseline version (Inefficient way)
# (concatenation using the += operator)
def test_13_v0(l_strings):
output = ""
for a_str in l_strings:
output += a_str
return output
# Improved version (using join)
def test_13_v1(l_strings):
output_list = []
for a_str in l_strings:
output_list.append(a_str)
return "".join(output_list)
# Helper function to generate a large list of fake names
from faker import Faker
def generate_fake_names(count: int = 10000):
fake = Faker()
output_list = []
for _ in range(count):
output_list.append(fake.name())
return output_list
l_strings = generate_fake_names(count=50000)Result:
# Summary Of Test Results
Baseline: 32.423 ns per loop
Improved: 21.051 ns per loop
% Improvement: 35.1 %
Speedup: 1.54xSummary
Using Python's built‑in map() function instead of an explicit for‑loop speeds up execution by 970×.
Replacing nested for‑loops with set operations speeds up execution by 498×.
Using itertools.filterfalse avoids intermediate lists and speeds up execution by 131×.
Applying functools.lru_cache for memoization speeds up execution by 57×.
Follow the public account and reply with "pycharm" to receive a PyCharm activation code.
Top Architecture Tech Stack
Sharing Java and Python tech insights, with occasional practical development tool tips.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.