GPU Command and Syncpoint Analysis on SM8650 Platform
On the SM8650 platform, GLES issues synchronous and draw commands that the kernel‑mode driver translates into kgsl_drawobj structures, queues them in per‑context dispatch lists, processes fence, timestamp, and timeline syncpoints via dedicated kernel threads, and finally submits draw objects to the GPU firmware, with eglSwapBuffers triggering a fence syncpoint, a draw command, and a GPU fence creation.
On the SM8650 platform, GLES sends two types of GPU commands to the kernel mode driver (KMD): synchronous commands and draw commands. Draw commands consist of individual drawcalls that are real GPU program instructions processed by the GPU hardware. Synchronous commands are handled entirely by the KMD, inserting synchronization points into the KMD task queue to block subsequent draw commands until the sync point is signaled.
When GLES sends a command, it creates a struct kgsl_gpu_command instance, packs the command data into it, and transmits it to the KMD via IOCTL_KGSL_GPU_COMMAND . The KMD abstracts the command into subclasses of struct kgsl_drawobj : draw commands become struct kgsl_drawobj_cmd , sync commands become struct kgsl_drawobj_sync . The ioctl implementation converts the kgsl_gpu_command into the appropriate kgsl_drawobj and stores it in the context’s drawqueue.
During KMD initialization, a struct adreno_device is created for each GPU device. Each device contains a struct adreno_hwsched with sixteen linked lists of struct adreno_dispatch_job , each list associated with a context. Two kernel threads are also created: kgsl_hwsched , which processes drawobjects from the context’s drawqueue, and kgsl-events , which handles GPU sync‑point signals.
Sync commands are classified by the KMD into three types: (1) fence syncpoint command ( struct kgsl_cmd_syncpoint_fence ), (2) timestamp syncpoint command ( struct kgsl_cmd_syncpoint_timestamp ), and (3) timeline syncpoint command ( struct kgsl_cmd_syncpoint_timeline ). Regardless of type, they are ultimately converted into struct kgsl_drawobj_sync_event objects stored inside a struct kgsl_drawobj_sync instance.
Fence syncpoint processing follows two steps. First, the command is parsed, transformed into a kgsl_drawobj_sync , and a dispatch job is added to the appropriate adreno_hwsched list. Second, when the fence is signaled, drawobj_sync_expire creates another dispatch job (often redundant) and wakes the kgsl_hwsched thread, which simply removes the sync drawobj from the queue without sending anything to the GPU hardware.
Timestamp syncpoint processing is similar but uses the KGSL events framework. After parsing, a kgsl_drawobj_sync and a corresponding kgsl_event are created. When the event’s timestamp is less than the GPU retire timestamp, the kgsl-events thread runs _kgsl_event_worker , which calls drawobj_sync_func → drawobj_sync_expire , ultimately waking the kgsl_hwsched thread for the same queue removal as the fence case.
Draw command processing also consists of two steps. The user‑space draw command is parsed into struct kgsl_drawobj_cmd and placed in the drawqueue, while a dispatch job is added to the adreno_hwsched list. The kgsl_drawobj_cmd holds command data in the cmlist (derived from kgsl_gpu_command.cmdlist ) and object data in memlist / profiling_buf_entry . Submission is performed by gen7_hwsched_submit_drawobj , which converts the drawobj into struct hfi_submit_cmd structures and writes them to GPU memory via gen7_gmu_context_queue_write , after which the GPU firmware processes them.
GPU fence analysis shows that GPU‑created fences are represented by struct kgsl_sync_fence , each linked to a struct kgsl_event . Every context owns a struct kgsl_sync_timeline that maintains a list of these fences. Fences are created through IOCTL_KGSL_TIMESTAMP_EVENT , which returns a fence file descriptor to user space. Signaling proceeds through the kgsl-events thread: _kgsl_event_worker → kgsl_sync_fence_event_cb → kgsl_sync_timeline_signal , ultimately invoking the DMA‑fence callback.
Complete rendering flow for an eglSwapBuffers call triggers three ioctls: a fence syncpoint command, a draw command, and a GPU fence creation. Using ftrace, the sequence can be observed as: syncpoint_fence , two adreno_cmdbatch_queued entries (first for the fence, second for the draw), kgsl_register_event (GPU fence creation), syncpoint_fence_expire (release fence signal), adreno_cmdbatch_submitted (command submission to GPU), and finally kgsl_fire_event (GPU fence signal).
The analysis is based on the SM8650 chipset, Android OS, and the GLES API, and may not be directly applicable to other Qualcomm platforms or graphics APIs.
OPPO Kernel Craftsman
Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.