Understanding Video Frame Color Spaces: RGB, YUV, and FFmpeg Pixel Formats
This article explains the fundamentals of video frame color spaces, covering how video frames are composed, the differences between RGB and YUV, FFmpeg pixel format definitions, chroma subsampling schemes, storage layouts, and conversion formulas, with code examples and visual illustrations.
After entering frontend audio/video development, mastering basic concepts such as video frames and color encoding is essential when using FFmpeg+WASM for frame extraction. This article introduces color spaces used in video frames.
1. Video Frames
Video consists of a series of frames displayed at intervals such as 1/24 or 1/30 second, forming continuous motion. Frames are represented by pixel matrices in either RGB or YUV color spaces. In FFmpeg, the header
libavutil/pixfmt.hdefines many pixel formats, most of which belong to RGB or YUV families.
<code>enum AVPixelFormat {
// ... (omitted less important types)
///< planar YUV 4:2:0, 12bpp, (1 Cr & Cb sample per 2x2 Y samples)
AV_PIX_FMT_YUV420P,
///< packed YUV 4:2:2, 16bpp, Y0 Cb Y1 Cr
AV_PIX_FMT_YUYV422,
///< planar YUV 4:2:2, 16bpp, (1 Cr & Cb sample per 2x1 Y samples)
AV_PIX_FMT_YUV422P,
///< packed YUV 4:2:2, 16bpp, Cb Y0 Cr Y1
AV_PIX_FMT_UYVY422,
///< planar YUV 4:4:4, 24bpp, (1 Cr & Cb sample per 1x1 Y samples)
AV_PIX_FMT_YUV444P,
///< planar YUV 4:4:0 (1 Cr & Cb sample per 1x2 Y samples)
AV_PIX_FMT_YUV440P,
///< packed RGB 8:8:8, 24bpp, RGBRGB...
AV_PIX_FMT_RGB24,
///< packed RGB 8:8:8, 24bpp, BGRBGR...
AV_PIX_FMT_BGR24,
///< packed ARGB 8:8:8:8, 32bpp, ARGBARGB...
AV_PIX_FMT_ARGB,
///< packed RGBA 8:8:8:8, 32bpp, RGBARGBA...
AV_PIX_FMT_RGBA,
///< packed ABGR 8:8:8:8, 32bpp, ABGRABGR...
AV_PIX_FMT_ABGR,
///< packed BGRA 8:8:8:8, 32bpp, BGRABGRA...
AV_PIX_FMT_BGRA,
///< packed RGB 5:6:5, 16bpp, (msb) 5R 6G 5B(lsb), big-endian
AV_PIX_FMT_RGB565BE,
///< packed RGB 5:6:5, 16bpp, (msb) 5R 6G 5B(lsb), little-endian
AV_PIX_FMT_RGB565LE,
///< packed RGB 5:5:5, 16bpp, (msb)1X 5R 5G 5B(lsb), big-endian
AV_PIX_FMT_RGB555BE,
///< packed RGB 5:5:5, 16bpp, (msb)1X 5R 5G 5B(lsb), little-endian
AV_PIX_FMT_RGB555LE,
///< packed BGR 5:6:5, 16bpp, (msb) 5B 6G 5R(lsb), big-endian
AV_PIX_FMT_BGR565BE,
///< packed BGR 5:6:5, 16bpp, (msb) 5B 6G 5R(lsb), little-endian
AV_PIX_FMT_BGR565LE,
///< packed BGR 5:5:5, 16bpp, (msb)1X 5B 5G 5R(lsb), big-endian
AV_PIX_FMT_BGR555BE,
///< packed BGR 5:5:5, 16bpp, (msb)1X 5B 5G 5R(lsb), little-endian
AV_PIX_FMT_BGR555LE,
}
</code>Each format comment begins with either
packedor
planar. YUV types are followed by numbers such as 4:2:0, 4:2:2, indicating chroma sampling schemes.
2. RGB and YUV
RGB and YUV are both color spaces. RGB uses three channels—red, green, and blue—to represent colors. YUV separates luminance (Y) from chrominance (U and V), allowing a black‑and‑white image to be displayed without color information.
1. RGB
In CSS, developers frequently use RGB or RGBA values. FFmpeg defines 16‑bit formats (RGB555, RGB565), 24‑bit (RGB24), and 32‑bit (RGBA/ARGB). Endianness determines byte order, so RGB may appear as BGR in memory.
<code># RGB555
XRRR RRGG GGGB BBBB
# RGB565
RRRR RGGG GGGB BBBB
</code>RGB24 allocates 8 bits per channel (24 bits total). RGB32 adds an 8‑bit alpha channel, often called RGBA or ARGB.
<code># RGB24
RRRRRRRR GGGGGGGG BBBBBBBB
# RGB32
RRRRRRRR GGGGGGGG BBBBBBBB AAAAAAAA
</code>2. YUV
YUV (also known as Y’UV, YCbCr, YPbPr) is used in video pipelines to reduce bandwidth. Common subsampling formats include YUV420, YUV422, and YUV444, which differ in how chroma information is sampled.
YUV ↔ RGB conversion
Conversion formulas:
<code>R = Y + 1.13983 * V
G = Y - 0.39465 * U - 0.58060 * V
B = Y + 2.03211 * U
</code> <code>Y = 0.299 * R + 0.587 * G + 0.114 * B
U = -0.14713 * R - 0.28886 * G + 0.436 * B
V = 0.615 * R - 0.51499 * G - 0.10001 * B
</code>Sampling
Chroma subsampling reduces color data based on human visual sensitivity. Notation J:A:B (e.g., 4:2:2) describes horizontal sampling (J), first‑row chroma samples (A), and second‑row chroma samples (B).
J – horizontal sampling width, usually 4.
A – number of chroma samples in the first row of J pixels.
B – number of chroma samples in the second row of J pixels.
YUV 4:4:4
Full sampling: each Y has its own U and V, so the image size matches the original RGB.
YUV 4:2:2
Two Y samples share one UV pair, reducing data size to two‑thirds of the original.
YUV 4:2:0
Common for video frames: the first row samples Y and UV at a 2:1 ratio, while the second row samples only Y.
Storage formats
Pixel data can be stored as packed or planar . Packed stores channel values consecutively (e.g., RGBRGB...). Planar stores all Y values first, then U, then V (e.g., I420).
Packed
Typical for RGB. In YUV, packed formats such as YUYV and UYVY are based on 4:2:2 sampling.
YUYV example order: Y0 U0 Y1 V0, Y2 U2 Y3 V2.
UYVY example order: U0 Y0 V0 Y1, U2 Y2 V2 Y3.
Planar
All Y samples are stored first, followed by U and V. Example: I420 (YUV 4:2:0).
3. Conclusion
Research shows many YUV formats exist, but their principles are similar; the lack of a unified standard once caused chaos.
FFmpeg provides CPU‑based YUV‑to‑RGB conversion; the formulas can be expressed as matrix multiplication, which can be accelerated on GPUs for better performance.
References
libavutil/pixfmt.h source code
Wikipedia – Film frame
Wikipedia – pixel format
Wikipedia – chroma subsampling
Wikipedia – YUV
Zhihu – Video and frame basics
Audio‑Video Development – Understanding YUV sampling and formats
Tencent IMWeb Frontend Team
IMWeb Frontend Community gathering frontend development enthusiasts. Follow us for refined live courses by top experts, cutting‑edge technical posts, and to sharpen your frontend skills.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.