Python Performance Optimization Techniques: Built‑in Functions, List Comprehensions, Generators, Caching, NumPy, Multiprocessing and More
This article introduces a range of Python performance‑optimization methods—including built‑in functions, list comprehensions, generator expressions, avoiding globals, functools.lru_cache, NumPy, pandas, multiprocessing, Cython, PyPy, and line_profiler—illustrated with clear code examples and a practical image‑processing case study.
Performance optimization refers to the process of improving program execution efficiency, which is essential for handling large data, real‑time applications, or resource‑constrained environments.
1. Use built‑in functions – Functions such as sum() , max() and min() are implemented in C and run faster than equivalent Python code.
<code># Using built‑in sum() to calculate the total of a list
numbers = [1, 2, 3, 4, 5]
total = sum(numbers)
print(total) # Output: 15
</code>2. List comprehensions provide a concise and often faster way to build lists compared with traditional for‑loops.
<code># Traditional loop
squares = []
for i in range(10):
squares.append(i ** 2)
print(squares) # Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
# List comprehension
squares = [i ** 2 for i in range(10)]
print(squares) # Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
</code>3. Generator expressions are lazy‑evaluated versions of list comprehensions, generating items on demand and saving memory for large datasets.
<code># Generator expression
squares_gen = (i ** 2 for i in range(10))
print(list(squares_gen)) # Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
</code>4. Avoid global variables – Accessing globals is slower than accessing locals; keep variables inside functions whenever possible.
<code># Global variable example
x = 10
def global_var():
return x + 1
print(global_var()) # Output: 11
# Local variable example
def local_var():
y = 10
return y + 1
print(local_var()) # Output: 11
</code>5. Use functools.lru_cache to memoize function results, which is especially useful for recursive functions.
<code>import functools
@functools.lru_cache(maxsize=128)
def fibonacci(n):
if n < 2:
return n
return fibonacci(n-1) + fibonacci(n-2)
print(fibonacci(30)) # Output: 832040
</code>6. Leverage numpy and pandas for efficient numerical computation and data manipulation.
<code>import numpy as np
# Compute squares of an array using NumPy
array = np.array([1, 2, 3, 4, 5])
squared = array ** 2
print(squared) # Output: [ 1 4 9 16 25]
</code>7. Use the multiprocessing module to run code in parallel across multiple CPU cores.
<code>import multiprocessing
def worker(num):
return num * num
if __name__ == "__main__":
with multiprocessing.Pool(processes=4) as pool:
results = pool.map(worker, [1, 2, 3, 4, 5])
print(results) # Output: [1, 4, 9, 16, 25]
</code>8. Compile performance‑critical sections with Cython to get C‑level speed while writing Python‑like code.
<code># example.pyx
cdef int square(int x):
return x * x
# setup.py
from setuptools import setup
from Cython.Build import cythonize
setup(ext_modules=cythonize("example.pyx"))
# Use the compiled module
import example
print(example.square(5)) # Output: 25
</code>9. Run code on PyPy , a JIT‑enabled Python interpreter that can dramatically speed up many workloads.
<code># Install PyPy
sudo apt-get install pypy
# Execute a script with PyPy
pypy my_script.py
</code>10. Profile with line_profiler to locate bottlenecks and focus optimization efforts.
<code># Install line_profiler
pip install line_profiler
@profile
def my_function():
a = [1] * 1000000
b = [2] * 1000000
del a
return b
my_function()
</code>Practical case: Optimizing image processing – Using the Pillow library together with multiprocessing to convert images to grayscale, resize them, and save the results in parallel.
<code>from PIL import Image
import os
import multiprocessing
def process_image(image_path):
# Open image
image = Image.open(image_path)
# Convert to grayscale
gray_image = image.convert('L')
# Resize image
resized_image = gray_image.resize((100, 100))
# Save processed image
output_path = f"processed_{os.path.basename(image_path)}"
resized_image.save(output_path)
print(f"Processed {image_path} and saved to {output_path}")
def main():
image_paths = ["image1.jpg", "image2.jpg", "image3.jpg"]
# Use multiprocessing to handle images concurrently
with multiprocessing.Pool(processes=4) as pool:
pool.map(process_image, image_paths)
if __name__ == "__main__":
main()
</code>Conclusion – By applying built‑in functions, list comprehensions, generator expressions, avoiding globals, caching with functools.lru_cache , using NumPy/Pandas, multiprocessing, Cython, PyPy, and profiling tools like line_profiler , developers can significantly improve the speed and efficiency of Python programs, as demonstrated in the image‑processing example.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.