Fundamentals 15 min read

Why Deferred Rendering Beats Forward Rendering on Mobile GPUs – A Deep Dive

This article compares forward and deferred rendering techniques, analyzes their performance trade‑offs on mobile GPUs, explores tile‑based and hardware‑TBDR approaches, and presents a Metal‑based single‑pass deferred shading solution for modern mobile graphics pipelines.

Kuaishou Large Model
Kuaishou Large Model
Kuaishou Large Model
Why Deferred Rendering Beats Forward Rendering on Mobile GPUs – A Deep Dive

0x00 Programmable Rendering Pipeline

In the fixed‑function pipeline of early OpenGL ES 1.0, rendering was limited to a set of switches; the programmable pipeline replaces these with shader stages (vertex, geometry, fragment) that can be instructed to control how the GPU draws objects.

0x01 Forward Rendering

Forward rendering submits meshes through the shader stages directly to the color buffer. Lighting is computed per object per light, leading to a linear O(n·m) complexity for n objects and m lights.

Shading is performed before depth testing.

Complexity O(n·m) for n objects and m lights.

Heavy light count makes it unsuitable for scenes with many lights.

0x02 Deferred Rendering

Deferred (or deferred shading) first renders geometry into a G‑Buffer (depth, normal, albedo, etc.) and then applies lighting in screen space, reducing complexity to O(n+m). This approach avoids shading fragments that will be discarded by depth testing but introduces high memory and bandwidth costs.

Higher memory usage due to multiple render targets.

G‑Buffer read/write bandwidth becomes a performance bottleneck.

Transparent objects require additional handling.

Multi‑sampling anti‑aliasing (MSAA) support is limited because MRT is needed.

Material information loss in screen‑space shading may require extra steps for varied visual styles.

0x03 Tile‑Based Deferred Rendering

Tile‑based deferred rendering divides the screen into tiles, computes which lights affect each tile, and performs lighting for all lights covering a tile in a single pass, reducing redundant G‑Buffer reads.

Generate the G‑Buffer.

Partition the G‑Buffer into tiles and compute depth bounds (often via a compute shader).

Perform light culling to produce a light‑index list per tile.

Execute a color pass using the G‑Buffer and the per‑tile light list.

Benchmarks show significant performance gains over classic deferred lighting when many lights are present.

0x04 Summary of Rendering Choices

Forward rendering is efficient for simple scenes with few lights and low memory overhead, while deferred rendering excels in complex, light‑rich scenes but incurs higher memory and bandwidth costs. The optimal choice depends on scene requirements and team expertise.

0x05 Mobile GPU Considerations

Mobile GPUs prioritize power efficiency; bandwidth is the dominant factor affecting power draw. Major mobile GPUs (Qualcomm Adreno, ARM Mali, Imagination PowerVR) use tile‑based architectures rather than desktop‑style immediate‑mode rendering (IMR). PowerVR’s hardware tile‑based deferred rendering (TBDR) stores tiles on‑chip, reducing bandwidth.

0x06 Metal‑Based TBDR Design

Apple’s Metal demo demonstrates a single‑pass deferred rendering technique that leverages on‑chip memory to keep G‑Buffer tiles resident, avoiding costly system‑memory transfers between passes. This approach, combined with Metal 2 features like Raster Order Groups and Image Blocks, further optimizes the pipeline.

0x07 Conclusion

The article presented forward and deferred rendering, analyzed mobile GPU constraints, and showed how hardware‑TBDR and software‑tile‑based techniques can be combined using Metal to achieve efficient real‑time rendering on mobile devices.

References

forward-rendering-vs-deferred-rendering

Real‑Time Rendering 3rd – Chapter on Deferred Rendering

Rendering a Scene with Deferred Lighting (Apple)

Mobile GPU Architecture Rendering Optimizations

Apple – Understanding GPU Family 4

Apple – About Raster Order Groups

Light Pre‑Pass Renderer

Mobile GPU Architecture Overview

ImgTec TBDR

Light Pre‑Pass Renderer (duplicate link)

graphicsRenderingMetalmobile GPUtile‑based renderingdeferred renderingforward rendering
Kuaishou Large Model
Written by

Kuaishou Large Model

Official Kuaishou Account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.