Artificial Intelligence 13 min read

Overview of NVIDIA Merlin for Recommendation Systems

This article introduces NVIDIA's Merlin suite, covering product overview, Merlin Models & Systems, the TensorFlow Distributed Embedding (TFDE) plugin, the Hierarchical‑KV library, and the Hierarchical Parameter Server (HPS), while highlighting their architecture, performance benefits, and ease of integration for large‑scale recommendation workloads.

DataFunSummit

Nov 19, 2023

Overview of NVIDIA Merlin for Recommendation Systems

The presentation introduces NVIDIA Merlin, a comprehensive toolkit for building and deploying recommendation system models. It begins with a product overview, outlining the end‑to‑end workflow that includes feature engineering (NVTABULAR), model training (HugeCTR, Merlin Data Loader), and deployment tools such as the Hierarchical Parameter Server (HPS).

Merlin Models & Systems provide high‑level Python APIs that wrap classic recommendation models (e.g., DLRM, DCN, YouTube DNN) and simplify feature engineering, training, and inference. Users can instantiate models with a single function call and switch between CPU and GPU execution with minimal code changes.

The TensorFlow Distributed Embedding (TFDE) plugin accelerates embedding lookups during training by moving both dense and sparse computations to GPUs, achieving speedups of up to 600× compared to pure‑CPU implementations, and works for both large and small models.

At the lowest level, Merlin Hierarchical‑KV (HKV) is a C++ library that offers a hierarchical, GPU‑backed key‑value store with eviction policies (LRU, LFU, custom). It provides unified memory across CPU/GPU, high performance, and easy integration into existing frameworks.

For inference, the Hierarchical Parameter Server (HPS) caches hot embeddings on each GPU, falling back to CPU memory, SSD, or external backends (e.g., RocksDB) when needed. HPS integrates with Triton, TensorFlow, and PyTorch plugins, delivering low‑latency serving and supporting continuous training pipelines.

Performance benchmarks on DLRM models demonstrate that GPU‑accelerated dense and embedding paths significantly reduce latency across batch sizes, and the system’s modular plugins make it straightforward to adopt in production environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

GPU Acceleration Parameter Server Distributed Embedding Hierarchical KV NVIDIA Merlin

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.