Fundamentals 9 min read

10 Game-Changing Python Performance Hacks You Should Know

This article presents ten practical Python performance optimization techniques—including list comprehensions, built‑in functions, generator expressions, set lookups, multiprocessing, caching, itertools, Cython/Numba, and profiling—to help developers write faster, more efficient code for large data and CPU‑intensive tasks.

Code Mala Tang

Mar 4, 2025

10 Game-Changing Python Performance Hacks You Should Know

Python is praised for its simplicity and readability, but improper use can lead to slow execution. Over the years I have discovered ten game‑changing performance optimization tricks that make Python code faster and more efficient.

1. Use list comprehensions instead of loops

Loops are useful but can be slow on large datasets. List comprehensions are faster and more Pythonic.

numbers = []
for i in range(1000000):
    numbers.append(i * 2)

Recommended version using a list comprehension:

numbers = [i * 2 for i in range(1000000)]

Even better with a generator expression (saves memory):

numbers = (i * 2 for i in range(1000000))

2. Use built‑in functions and libraries

Python’s built‑in functions are implemented in C and therefore much faster than hand‑written equivalents.

def custom_sum(numbers):
    total = 0
    for num in numbers:
        total += num
    return total

custom_sum([1, 2, 3, 4, 5])

Recommended version using sum():

sum([1, 2, 3, 4, 5])

Using NumPy for numeric operations:

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
arr.sum()

Built‑in functions like sum(), max() and min() are C‑optimized and far faster than manual implementations.

3. Use join() instead of string concatenation

Concatenating strings with + inside a loop creates a new string each time and is very slow.

words = ["Python", "is", "fast"]
sentence = ""
for word in words:
    sentence += word + " "

Recommended version using join():

sentence = " ".join(words)

join()

is implemented in C with O(n) complexity, while repeated + concatenation has O(n²) complexity.

4. Use generators for large data

Generators produce values lazily, saving memory compared to storing large lists.

def get_numbers():
    return [i for i in range(1000000)]

Recommended version using a generator:

def get_numbers():
    for i in range(1000000):
        yield i

Generators do not store data in memory, making them ideal for processing massive datasets.

5. Use set() for fast membership tests

Checking membership in a list takes O(n) time, while a set provides average O(1) lookup.

items = [1, 2, 3, 4, 5]
if 3 in items:
    print("Found")

Recommended version using a set:

items = {1, 2, 3, 4, 5}
if 3 in items:
    print("Found")

Sets are implemented with hash tables, giving much faster lookups than lists.

6. Use multiprocessing for CPU‑bound tasks

The Global Interpreter Lock (GIL) prevents multiple threads from executing Python bytecode simultaneously. For CPU‑intensive work, multiprocessing is preferred over multithreading.

from threading import Thread

def compute():
    print(sum(i * i for i in range(10**7)))

threads = [Thread(target=compute) for _ in range(4)]
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

Recommended version using multiprocessing:

from multiprocessing import Pool

def compute(_):
    return sum(i * i for i in range(10**7))

with Pool(4) as p:
    p.map(compute, range(4))

Each process has its own Python interpreter, bypassing the GIL.

7. Use lru_cache() to cache expensive function calls

def slow_function(n):
    print(f"Computing {n}...")
    return n * n

print(slow_function(10))
print(slow_function(10))

Recommended version with caching:

from functools import lru_cache

@lru_cache(maxsize=1000)
def slow_function(n):
    print(f"Computing {n}...")
    return n * n

print(slow_function(10))
print(slow_function(10))

lru_cache()

stores results, avoiding repeated calculations.

8. Use itertools for efficient iteration

The itertools module provides high‑performance iterator functions.

from itertools import combinations
items = ["a", "b", "c"]
for combo in combinations(items, 2):
    print(combo)

Operations in itertools are implemented in optimized C code, making them faster than manual loops.

9. Accelerate code with Cython or Numba

For performance‑critical tasks, compile Python code to C using Cython or Numba.

from numba import jit

@jit(nopython=True)
def fast_function(n):
    return sum(i * i for i in range(n))

print(fast_function(10**7))

Numba compiles Python to machine code, yielding speedups of 10‑100×.

10. Profile code with cProfile

Use cProfile to locate bottlenecks and guide optimizations.

python -m cProfile -s time my_script.py

Profiling identifies slow functions, enabling targeted improvements.

Final Thoughts

These ten optimization techniques can dramatically improve the speed and efficiency of Python code, whether you are handling large datasets, CPU‑intensive tasks, or API calls.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Python programming tips Code Efficiency Python Best Practices

Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.