Fundamentals 20 min read

GPU Overview, Usage Methods, and Virtualization Technologies

This article explains the definition and history of GPUs, why dedicated graphics processors are needed, how they are accessed through graphics libraries and vendor APIs such as OpenGL, DirectX, CUDA and OpenCL, and describes various GPU virtualization techniques including virtual graphics cards, passthrough, and vCUDA with their client‑server‑manager architecture.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
GPU Overview, Usage Methods, and Virtualization Technologies

1. GPU Overview

GPU stands for Graphic Processing Unit; in Chinese it is called 计算机图形处理器 and was introduced by NVIDIA in 1999.

Compared with the CPU, the GPU was created to meet the growing demand for graphics, especially in home systems and gaming, because traditional CPUs cannot handle the workload efficiently.

The GPU is the "heart" of a graphics card, analogous to the CPU in a computer system. It distinguishes 2D from 3D hardware graphics cards: 2D cards rely on the CPU for rendering (soft acceleration), while 3D cards perform rendering in hardware (hard acceleration). Most modern cards are produced by NVIDIA and ATI.

1.1 Why a dedicated GPU is needed and why the CPU cannot replace it

GPU uses a parallel programming model that is fundamentally different from the CPU's serial model, making many CPU‑optimized algorithms unsuitable for direct mapping. Its architecture resembles a shared‑memory multiprocessor, so programs written for GPUs differ greatly from CPU programs.

Graphics rendering tasks are highly parallel, so adding more parallel processing units and memory controllers in a GPU dramatically improves performance and bandwidth.

Unlike CPUs, which are designed for general‑purpose, control‑heavy tasks, GPUs focus on compute‑intensive, logic‑light workloads, excelling at massive data‑parallel operations and frequent memory accesses.

1.2 How to use a GPU

There are two ways to use a GPU: (1) an application calls a generic graphics library that internally uses the GPU, or (2) the application directly uses the GPU’s own API.

1.2.1 Generic graphics libraries

Common graphics libraries are OpenGL and Direct3D/DirectX. OpenGL is a cross‑platform standard for interactive 2D/3D graphics, originally developed by SGI. DirectX (Direct Extension) is Microsoft’s multimedia API suite, updated to match new GPU features.

1.2.2 GPU‑specific programming interfaces

NVIDIA provides the CUDA framework, while AMD (formerly ATI) introduced CTM (Close To Metal) in 2006, later replaced by the ATI Stream SDK and finally by the open OpenCL standard. CUDA allows developers to write code in a C‑like language that runs on the GPU without using graphics APIs.

In the CUDA model, a host CPU coordinates with one or more device GPUs (or co‑processors). The CPU handles control‑heavy, serial tasks, while the GPU executes highly parallel kernels. This model is used in fields such as oil exploration, fluid dynamics, molecular dynamics, bio‑computing, audio/video codecs, and astronomy.

Enterprise applications often prefer generic graphics libraries for cost and compatibility reasons.

1.3 How a GPU works

A GPU consists mainly of a Vertex Processor and a Fragment (Pixel) Processor. These units operate as stream processors, using on‑chip registers rather than large caches, to process data streams efficiently.

When used for graphics, the vertex, pixel, and geometry pipelines are implemented by these stream processors, which behave like a multi‑core processor with flexible data movement and dynamic task assignment.

2. GPU Virtualization

In virtual machine environments, graphics can be provided by three approaches: virtual graphics cards, direct physical GPU passthrough, or GPU virtualization.

2.1 Virtual graphics cards

Virtual graphics cards are the most common solution in current virtualization platforms. Examples include VNC (Virtual Network Computing), Xen virtual frame buffer, VMware virtual GPU, and VMGL (VMM‑Independent Graphics Acceleration).

VNC transmits the entire desktop over the network. Xen’s virtual frame buffer uses a VNC‑like server to send screen updates. VMGL implements a front‑end virtualization mechanism that forwards OpenGL calls to a remote host with a real GPU.

2.2 GPU Passthrough (Pass‑Through)

Passthrough assigns a physical GPU exclusively to a single VM, preserving full performance and allowing general‑purpose computing. It requires special GPU features and has limited compatibility (e.g., Intel VT‑d, Xen 4.0 VGA Passthrough, VMware VM Direct Path I/O).

Because the VM uses the native driver, features such as live migration, snapshot, or suspend/resume are not supported.

2.3 GPU Virtualization (vGPU)

GPU virtualization slices a physical GPU into time‑shares that can be allocated to multiple VMs. It works by intercepting and redirecting GPU‑related APIs (API remoting) and presenting a virtual GPU (vGPU) to each VM.

A typical implementation (e.g., NVIDIA vCUDA) consists of three components: a client driver inside the VM, a server component in a privileged VM (Domain‑0), and a management service that schedules and balances GPU resources.

2.3.1 Client side

The client driver intercepts CUDA API calls, packages and encodes parameters, sends them to the server, decodes responses, and maintains a vGPU data structure that tracks address space, memory objects, and execution order.

2.3.2 Server side

Running in the privileged VM, the server receives requests, validates them, executes them on the physical GPU via CUDA, encodes results, and returns them to the client. It also registers GPU devices with the manager and creates dedicated service threads for each application.

2.3.3 Management side

The manager resides in the privileged domain and performs global GPU resource scheduling, load balancing, and fault recovery, dynamically allocating GPU slices to VMs based on demand.

--- Promotional content omitted from the academic summary ---

GraphicsCUDAGPUvirtualizationCompute
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.