Comprehensive Introduction to NumPy: ndarray, Creation, Indexing, Operations, Linear Algebra, I/O, and Real‑World Data Analysis
This article provides a thorough overview of NumPy, covering its core ndarray structure, various array creation methods, indexing and slicing techniques, vectorized operations with broadcasting, statistical and linear‑algebra functions, file input/output, and a practical data‑analysis example, all illustrated with executable Python code.
NumPy (Numerical Python) is the foundational package for scientific computing in Python, offering high‑performance multi‑dimensional array objects and a rich set of mathematical functions that make it essential for data analysis and visualization.
1. NumPy Core: ndarray Multi‑dimensional Array
The central object in NumPy is ndarray , an efficient container for storing elements of the same type in multiple dimensions.
import numpy as np
# Create a 1‑D array
arr1d = np.array([1, 2, 3, 4, 5])
print(f"1‑D array:\n{arr1d}")
# Create a 2‑D array
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(f"\n2‑D array:\n{arr2d}")
# Array attributes
print(f"\nArray shape: {arr2d.shape}")
print(f"Array dtype: {arr2d.dtype}")
print(f"Array size: {arr2d.size}")2. Array Creation and Initialization
NumPy provides several convenient functions for creating arrays with specific values.
# Common array creation methods
zeros_arr = np.zeros((3, 4)) # all‑zero array
ones_arr = np.ones((2, 2)) # all‑one array
empty_arr = np.empty((2, 3)) # uninitialized array
eye_arr = np.eye(3) # identity matrix
range_arr = np.arange(0, 10, 2) # similar to Python's range
linspace_arr = np.linspace(0, 1, 5) # evenly spaced values
# Random arrays
random_arr = np.random.rand(3, 3) # uniform [0,1)
normal_arr = np.random.randn(100) # standard normal distribution
randint_arr = np.random.randint(0, 10, (3, 3)) # random integers3. Array Indexing and Slicing
NumPy supports flexible indexing, slicing, boolean masking, and fancy indexing.
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Basic indexing
print(arr[0]) # first row -> [1 2 3]
print(arr[1, 2]) # element at row 2, column 3 -> 6
# Slicing
print(arr[:2]) # first two rows
print(arr[:, 1:]) # columns from the second onward
# Boolean indexing
mask = arr > 5
print(arr[mask]) # elements greater than 5 -> [6 7 8 9]
# Fancy indexing
print(arr[[0, 2]]) # first and third rows4. Array Operations and Broadcasting
NumPy enables vectorized arithmetic and broadcasting to apply operations across arrays of different shapes efficiently.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Basic arithmetic
print(a + b) # [5 7 9]
print(a * b) # [4 10 18]
print(np.dot(a, b)) # dot product -> 32
# Broadcasting example
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr + 10) # add 10 to every element
print(arr * [0.5, 1, 1.5]) # column‑wise scaling
# Universal functions (ufuncs)
print(np.sqrt(arr))
print(np.exp(arr))
print(np.sin(arr))5. Statistics and Aggregation
NumPy offers a suite of statistical functions for summarizing data along specified axes.
data = np.random.randn(100, 5) # 100 rows, 5 columns
print(f"Mean: {np.mean(data)}")
print(f"Std dev: {np.std(data)}")
print(f"Min: {np.min(data)}")
print(f"Max: {np.max(data)}")
print(f"Median: {np.median(data)}")
# Axis‑wise calculations
print(f"Column means: {np.mean(data, axis=0)}")
print(f"Row sums: {np.sum(data, axis=1)}")
# Cumulative sum and product of the first column
print(f"Cumulative sum: {np.cumsum(data[:,0])}")
print(f"Product: {np.prod(data[:,0])}")6. Linear Algebra Operations
The numpy.linalg module provides powerful linear‑algebra capabilities.
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Matrix multiplication
print(np.dot(A, B))
print(A @ B) # Python 3.5+ operator
# Linear‑algebra functions
print(f"Determinant: {np.linalg.det(A)}")
print(f"Inverse: {np.linalg.inv(A)}")
print(f"Eigenvalues & vectors: {np.linalg.eig(A)}")
# Solve linear system Ax = b
b = np.array([5, 11])
x = np.linalg.solve(A, b)
print(f"Solution: {x}")7. File Input/Output
NumPy makes it easy to save and load arrays in binary or text formats.
# Save and load a single array
arr = np.random.rand(5, 5)
np.save('array_data.npy', arr)
loaded_arr = np.load('array_data.npy')
# Text file
np.savetxt('array.txt', arr, delimiter=',')
txt_arr = np.loadtxt('array.txt', delimiter=',')
# Multiple arrays in a .npz archive
np.savez('arrays.npz', a=arr, b=arr*2)
arrays = np.load('arrays.npz')
print(arrays['a'])8. Real‑World Application Example: Sales Data Analysis
This example demonstrates how NumPy can be used to simulate sales data, compute total revenue, identify the best‑selling product, and analyze monthly growth, with optional visualization using Matplotlib.
# Simulate sales data (12 months, 5 products)
sales = np.random.randint(100, 1000, (12, 5))
products = ['A', 'B', 'C', 'D', 'E']
months = np.arange(1, 13)
# Total sales
total_sales = np.sum(sales)
print(f"Annual total sales: {total_sales}")
# Best‑selling product
product_sales = np.sum(sales, axis=0)
best_product = products[np.argmax(product_sales)]
print(f"Top product: {best_product}")
# Monthly growth rate
monthly_growth = np.diff(np.sum(sales, axis=1)) / np.sum(sales[:-1], axis=1) * 100
print(f"Average monthly growth: {np.mean(monthly_growth):.2f}%")
# Optional heat‑map (requires matplotlib)
import matplotlib.pyplot as plt
plt.imshow(sales, cmap='hot', aspect='auto')
plt.colorbar(label='Sales')
plt.xticks(np.arange(5), products)
plt.yticks(np.arange(12), months)
plt.title('Product Sales Heatmap')
plt.show()NumPy serves as the cornerstone of Python numerical computing; mastering its array operations and mathematical functions not only accelerates data‑processing tasks but also provides a solid foundation for advanced tools such as Pandas and machine‑learning frameworks like Scikit‑learn.
php中文网 Courses
php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.