Practical Python Performance Optimization Techniques
This article presents several practical Python performance‑optimization methods—including __slots__ for memory reduction, list comprehensions for faster loops, the lru_cache decorator for result caching, generators for low‑memory data processing, and local‑variable usage—to help developers write faster, more efficient code.
In performance‑critical scenarios Python is often criticized for being slower than compiled languages, but by leveraging features from the standard library we can significantly improve execution speed and memory usage. The article details five practical optimization techniques.
1. __slots__ mechanism: memory optimization
Python stores instance attributes in a dynamic dictionary, which adds overhead. Declaring __slots__ restricts attributes to a static structure, reducing memory consumption and speeding up attribute access.
<code>from pympler import asizeof
class person:
def __init__(self, name, age):
self.name = name
self.age = age
unoptimized_instance = person("Harry", 20)
print(f"UnOptimized memory instance: {asizeof.asizeof(unoptimized_instance)} bytes")
</code>The unoptimized instance occupies 520 bytes. Using __slots__ yields a 75 % memory reduction.
<code>from pympler import asizeof
class Slotted_person:
__slots__ = ['name', 'age']
def __init__(self, name, age):
self.name = name
self.age = age
optimized_instance = Slotted_person("Harry", 20)
print(f"Optimized memory instance: {asizeof.asizeof(optimized_instance)} bytes")
</code>A benchmark comparing creation time and memory shows the slotted class is both smaller and faster.
2. List comprehension: loop optimization
List comprehensions are implemented as optimized C loops, typically 30‑50 % faster than equivalent for loops.
<code>import time
# Traditional for‑loop
start = time.perf_counter()
squares_loop = []
for i in range(1, 10_000_001):
squares_loop.append(i ** 2)
end = time.perf_counter()
print(f"For loop: {end - start:.6f} seconds")
# List comprehension
start = time.perf_counter()
squares_comprehension = [i ** 2 for i in range(1, 10_000_001)]
end = time.perf_counter()
print(f"List comprehension: {end - start:.6f} seconds")
</code>The test confirms the comprehension runs noticeably faster.
3. @lru_cache decorator: result caching
For functions with repeated calculations, functools.lru_cache stores results in memory, dramatically reducing execution time, especially for recursive algorithms.
<code>def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
import time
start = time.perf_counter()
print(f"Result: {fibonacci(35)}")
print(f"Time without cache: {time.perf_counter() - start:.6f} seconds")
</code> <code>from functools import lru_cache
import time
@lru_cache(maxsize=128)
def fibonacci_cached(n):
if n <= 1:
return n
return fibonacci_cached(n-1) + fibonacci_cached(n-2)
start = time.perf_counter()
print(f"Result: {fibonacci_cached(35)}")
print(f"Time with cache: {time.perf_counter() - start:.6f} seconds")
</code>Cache‑enabled execution is thousands of times faster.
4. Generators: memory‑efficient data handling
Generators produce items on‑demand, avoiding the need to store large collections in memory.
<code>import sys
big_data_list = [i for i in range(10_000_000)]
print(f"Memory usage for list: {sys.getsizeof(big_data_list)} bytes")
result = sum(big_data_list)
print(f"Sum of list: {result}")
big_data_generator = (i for i in range(10_000_000))
print(f"Memory usage for generator: {sys.getsizeof(big_data_generator)} bytes")
result = sum(big_data_generator)
print(f"Sum of generator: {result}")
</code>The generator uses only a few hundred bytes, saving over 99.99 % of memory compared to the list.
In real projects, generators are ideal for processing large log files line‑by‑line:
<code>def log_file_reader(file_path):
with open(file_path, 'r') as file:
for line in file:
yield line
error_count = sum(1 for line in log_file_reader("large_log_file.txt") if "ERROR" in line)
print(f"Total errors: {error_count}")
</code>5. Local variable optimization
Accessing local variables is faster than globals because the interpreter resolves names in the local namespace first.
<code>import time
global_var = 10
def access_global():
global global_var
return global_var
def access_local():
local_var = 10
return local_var
start = time.time()
for _ in range(1_000_000):
access_global()
global_time = time.time() - start
start = time.time()
for _ in range(1_000_000):
access_local()
local_time = time.time() - start
print(f"Global access time: {global_time:.6f} seconds")
print(f"Local access time: {local_time:.6f} seconds")
</code>The benchmark shows local access is roughly twice as fast.
Performance Optimization Summary
Effective Python optimization combines memory‑saving techniques ( __slots__ , generators, local variables) with compute‑speed improvements (list comprehensions, lru_cache ). Developers should apply these strategies judiciously, balancing speed gains against code readability and maintainability.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.