Big Data 24 min read

DSP Advertising System Architecture and Key Technologies

This article presents a comprehensive overview of DSP advertising system architecture, covering real‑time bidding infrastructure, audience targeting, data processing pipelines using Hadoop, Spark and Storm, click‑through‑rate prediction models, and anti‑fraud mechanisms, offering practical insights for engineers building high‑performance ad platforms.

High Availability Architecture
High Availability Architecture
High Availability Architecture
DSP Advertising System Architecture and Key Technologies

Introduction

The speaker, a technical director experienced in mobile ad platforms and large‑scale data processing, shares practical experiences from building a Demand‑Side Platform (DSP) over several years, aiming to provide a reference for engineers interested in ad‑system development.

Requirements of a DSP

A robust DSP must support high‑throughput Real‑Time Bidding (RTB) capabilities and advanced audience‑targeting techniques, handling millions of page views and billions of users within strict latency constraints (typically under 100 ms).

System Architecture

The overall architecture includes advertisers, media sites, ad exchanges, and the DSP core modules such as the RTB engine, business platform, logging system, DMP, campaign management, and anti‑fraud components. The workflow spans from pre‑campaign setup, user behavior collection, request routing, bidding decision, price encryption, to impression and click billing.

RTB Engine

The RTB engine, implemented in C++ on Linux, exposes an HTTP interface and processes protobuf‑encoded requests. It uses a modular adapter layer to normalize traffic from various SSPs/ADXs, a pluggable algorithm pool with hot‑swap dynamic libraries (dlopen/dlsym/dlclose) and inotify‑driven configuration reloads, enabling A/B testing and real‑time algorithm updates.

DMP Data Processing

Data processing combines Hadoop offline jobs, Spark batch jobs, and Storm stream processing. Results are stored in HBase/MySQL for query, Redis for real‑time lookup, and Elasticsearch for near‑real‑time indexing. Machine‑learning libraries (MLlib) are used for classification, clustering, collaborative filtering, decision trees, and logistic regression.

User Profiling and Targeting

User profiles are built as {user_id: {tag: weight, …}} using three tag types: user‑behavior tags (t(u)), context tags (t(c)), and advertiser‑custom tags (t(a,u)). These tags support demographic, behavioral, contextual, and look‑alike targeting, with time‑decay mechanisms to keep profiles fresh.

CTR Prediction

Click‑through‑rate (CTR) prediction is critical for ranking ads. Simple historical statistics are supplemented with logistic‑regression models and can be extended with machine‑learning feature engineering, collaborative‑filtering, and various ranking approaches (point‑wise, pair‑wise, list‑wise).

Anti‑Fraud Measures

Anti‑fraud strategies include detecting P2P traffic‑swapping tools, CPS traffic‑steering fraud, and domain‑level analysis using WHOIS information. Honey‑pot servers capture malicious behavior, while logs are aggregated via Logstash, indexed in Elasticsearch, and visualized in Kibana to identify suspicious IPs, domains, and patterns.

Q&A Highlights

Answers cover DMP storage (HDFS, Hive, HBase, Redis, MySQL), hot‑swap implementation of RTB algorithms (dlopen, inotify), online vs. offline cookie mapping, and user identification methods across PC (cookies, Flash, evercookie) and mobile (Android ID, IMEI, IDFA) platforms.

big datamachine learninganti-fraudReal-Time Biddingad techDSP
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.