Artificial Intelligence 15 min read

Understanding Transpose Convolution (Deconvolution) in Convolutional Neural Networks

The article explains how transpose (de)convolution works as the spatial inverse of standard convolution, detailing its relationship to fully‑connected layers, padding, stride, output size formulas, odd‑case handling, and practical implementation in frameworks like PyTorch and TensorFlow.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Understanding Transpose Convolution (Deconvolution) in Convolutional Neural Networks

In recent convolutional neural network (CNN) designs, the transpose convolution layer—also known as deconvolution or fractionally‑strided convolution—appears frequently, especially in the generator of Generative Adversarial Networks (GANs) for up‑sampling. This article explains the relationship and differences between transpose convolution and standard convolution, and details the implementation process.

1. Convolutional Layer vs. Fully‑Connected Layer

Traditional feed‑forward neural networks use fully‑connected layers, where every neuron in one layer connects to every neuron in the next layer via a dense weight matrix. Convolutional layers, by contrast, use a sparse weight matrix (the kernel) that connects only a local region (e.g., a 3×3 patch) of the input to each output neuron. A convolutional layer can be viewed as a special case of a fully‑connected layer with many zero weights, dramatically reducing the number of parameters and enabling the network to learn local, translation‑invariant features.

2. Convolution Operation

2.1 Basic Convolution (no padding, stride = 1)

A single 3×3 kernel slides over the input image, performing element‑wise multiplication and summation to produce a 2×2 output feature map.

2.2 Convolution with Padding

Padding adds zeros around the input so that the output size can be kept equal to the input size ("same" padding). Other padding modes include "full" (padding = kernel‑size − 1) and "valid" (no padding).

same padding: output size = input size (e.g., 3×3 kernel → padding = 1)

full padding: padding = kernel‑size − 1

valid padding: padding = 0

2.3 Convolution with Stride > 1

Stride defines the step size of the kernel. A stride of 2 reduces the spatial resolution (down‑sampling). The output size is computed as:

W₂ = ⌊(W₁ − F + 2P) / S⌋ + 1

2.4 Relationship Between Input/Output Size, Kernel, Padding, and Stride

The general formula for the output width (or height) of a standard convolution is:

W₂ = ((W₁ − F + 2P) / S) + 1

If the division is not exact, the result is floored, leading to the so‑called “odd” convolution case.

3. Transpose Convolution (Deconvolution)

Transpose convolution performs the inverse spatial operation of a standard convolution: it up‑samples the feature map. It is often used in GAN generators and other decoder architectures.

3.1 No‑Padding, No‑Stride Case

The transpose of a simple convolution (no padding, stride = 1) is illustrated below.

3.2 Transpose of Padding Convolution

When the forward convolution uses padding, the corresponding transpose convolution may have a different padding configuration. The animation below shows the transpose of the “same padding = 1” case.

3.3 Transpose of Stride > 1 Convolution

To simulate a stride < 1 in the transpose operation, zeros are inserted between input elements (stride − 1 zeros). This effectively halves the step size.

3.4 Relationship Between Standard and Transpose Convolution Parameters

3.4.1 Transpose Padding

The padding used in the transpose operation (P_T) can be derived from the forward convolution parameters:

P_T = F − P − 1

where F is the kernel size and P is the forward padding.

3.4.2 Output Size of Transpose Convolution

The output width of a transpose convolution is:

W₁ = (W₂ − 1) × S − 2P + F

with S being the forward stride, P the forward padding, and F the kernel size.

3.4.3 Odd Convolution in the Transpose Setting

When the forward convolution involves an odd division, the transpose operation may need an additional output_padding to recover the missing pixels. In PyTorch this is exposed as the output_padding argument.

4. Summary

The article first reviews the connection between feed‑forward fully‑connected networks and CNNs, then details the mathematics of convolution (kernel size, padding, stride, and output size). Finally, it demystifies transpose convolution, providing concrete examples for each parameter setting and explaining how frameworks such as PyTorch and TensorFlow implement the operation.

5. References

Intuitive explanations of CNNs on Zhihu (translation‑invariance discussion).

"A Guide to Convolution Arithmetic for Deep Learning" – source of the animated illustrations.

Various online articles comparing convolution and transpose convolution.

CNNTensorFlowPyTorchdeep learningPaddingSTRIDEtranspose convolution
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.