Fundamentals 8 min read

Fundamentals of Derivatives and Partial Derivatives for Neural Networks

This article introduces the mathematical foundations of derivatives and partial derivatives, explains their role in optimizing neural network parameters, covers basic derivative formulas, linear properties, sigmoid derivative, minimum conditions, and constrained optimization using Lagrange multipliers, providing a comprehensive guide for machine‑learning practitioners.

Python Programming Learning Circle

Mar 12, 2020

Fundamentals of Derivatives and Partial Derivatives for Neural Networks

We first explain that neural networks learn by optimizing weights and biases, i.e., repeatedly taking derivatives to adjust parameters so that the output matches training data. The key concepts of optimization and differentiation are essential.

对权重和偏置进行最优化，使输出符合学习数据

1 Derivative Basics

Derivatives represent the slope of a function at a point; for a function f(x) , the derivative f'(x) gives the slope of the tangent line. A function must be 可导的 (differentiable) for its derivative to exist.

1.2 Common Derivative Formulas

Several frequently used derivative formulas are listed (image below).

1.3 Derivative Notation

The derivative of f(x) is denoted as f'(x) , but it can also be expressed as a fraction for convenience, as shown in the following image.

This fractional form is handy because complex functions can be differentiated like fractions.

1.4 Linear Properties of Derivatives

Derivatives obey linearity: the derivative of a sum equals the sum of derivatives, and the derivative of a difference equals the difference of derivatives. 和的导数等于导数的和 and 差的导数等于导数的差. Constant multiples also factor out: 常数倍的导数为导数的常数倍.

1.5 Derivative of Fractional Functions

When a function is expressed as a fraction, its derivative follows the rule

\left\{ \frac { 1 }{ f(x) } \right\}' = -\frac { f'(x) }{ { \left\{ f(x) \right\} }^{ 2 } }

, illustrated in the image below.

1.5.1 Derivative of the Sigmoid Function

The sigmoid σ(x) is a widely used activation function in neural networks. Its derivative is needed for gradient descent and can be expressed compactly; the relevant formula is shown in the following image.

1.5 Minimum Value Condition

For a function y = f(x) , a necessary condition for a minimum is that its derivative equals zero at that point. This principle extends to multivariate functions, where all partial derivatives must be zero. However, the condition is not sufficient, as illustrated by the examples and images below.

In the example, the derivative is zero at x = -1, 0, 2 , but only x = 2 corresponds to a true minimum, demonstrating that the zero‑derivative condition is necessary but not sufficient.

2 Partial Derivative Basics

For multivariate functions, the derivative with respect to a single variable while treating others as constants is called a 偏导数 (partial derivative). For a function z = f(x, y) , the partial derivative with respect to x is denoted as ∂z/∂x , and similarly for y .

2.2 Minimum Condition for Multivariable Functions

Just as in the single‑variable case, a necessary condition for a minimum of a smooth multivariable function is that all its partial derivatives vanish at that point. This condition extends to n variables, but, like before, it is not sufficient for guaranteeing a minimum.

2.3 Lagrange Multipliers

When optimization problems include constraints, the method of Lagrange multipliers can be applied. This technique is frequently used in regularization methods for neural networks to achieve better performance.

- END -

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Optimization Machine Learning neural networks calculus Derivatives partial derivatives

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.