Meituan's Exploration and Practice in Advertising Algorithm: Information Flow Ad Estimation

This article details Meituan Waimai's feed advertising system, covering business characteristics, the evolution of estimation models, and practical implementations such as decision‑path modeling, ultra‑long/wide user modeling, full‑reconstruction techniques, and the integration of large language models for CTR prediction.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Meituan's Exploration and Practice in Advertising Algorithm: Information Flow Ad Estimation

Feed Advertising Business and Estimation Technology Status

Meituan Waimai feed ads exhibit strong user‑behavior continuity (ordering intent usually completed within ten minutes), rich card information (ratings, discounts, delivery details), and abundant textual cues (merchant name, popular dishes).

Technical Overview and Evolution

The ad serving pipeline follows the classic recall → coarse ranking → fine ranking flow, but recall is constrained by location‑based services (LBS), leading to heavier compute allocation to fine ranking and mechanism layers.

Estimation algorithms have evolved over the past six‑seven years through three stages:

Tree models with continuous features and cross‑statistics (limited fitting capacity).

2017‑2020: Deep neural network (DNN) models with richer features, aligning with industry trends.

2021‑present: Sparse large models combined with ultra‑long sequences.

Current Modeling Directions

Focus is on three dimensions: user‑side, link‑side, and NLP‑side. Cross‑domain and multi‑scene modeling are deprioritized because they provide only shallow context learning and cannot sustain long‑term gains for a single large food‑delivery scenario.

User side : three phases—timeline, spatial line, and combined time‑space behavior patterns—leveraging session modeling, ultra‑long behavior modeling, and multi‑level short‑term/long‑term fusion.

Link side : page reconstruction and card reconstruction to recover the exact user‑visible content.

NLP side : LLM‑in‑CTR replaces the earlier multimodal direction.

Practical Implementations at Meituan

User Modeling Overview

The user side is split into three reverse‑engineered lines:

Timeline : multi‑level fusion of short‑term, medium‑term (day/week), and long‑term interests; more pages and paths are used to connect short‑ and long‑term signals, and end‑to‑end modules automatically extract patterns.

Spatial line : distinguishes real physical locations (e.g., office vs. home) and virtual entry points (Meituan app vs. Dianping app); for example, discount attention differs between the homepage and the membership entry.

Behavior‑impact line : models how explicit actions (e.g., receiving a red‑packet) affect subsequent ordering behavior.

Decision‑Path Modeling

DIN’s single‑point matching ignores preceding behaviors. The solution consists of three modules:

Path Enhance Module (PEM) : assesses correlation between prior paths and candidate clicks, then uses a fully‑connected MLP with Softmax‑Top‑K to generate path representations.

Path Augment Module (PAM) : applies contrastive learning where a user’s augmented path is a positive sample and other users’ paths are negatives, improving path representation.

Path Matching Module (PMM) : builds attention over PEM representations, selects top‑K historical paths, and adds item‑level matching for a two‑layer match.

Ultra‑Long / Ultra‑Wide User Modeling

Ultra‑long modeling clusters and hashes massive sequences; ultra‑wide modeling attempts to place all features together. A practical compromise uses a sequence length of 1000 and a feature width >10. Offline experiments show significant gains at the ten‑thousand‑scale, but online latency and iteration efficiency become limiting factors.

Two observed issues:

SIM/ETA filtering, effective in e‑commerce, harms food‑delivery because it discards useful cross‑category signals.

Linear scaling of DIN scores to ultra‑long sequences does not continuously improve AUC; the DIN network’s denoising capacity is insufficient for very long inputs.

Experiments indicate that a strong predictor network combined with a weak matching network yields the best CTR performance. A multi‑layer decoder that iteratively aggregates effective matrices improves AUC layer by layer (scaling‑law experiment).

Full‑Reconstruction Modeling

Full‑reconstruction aims to model everything the user actually sees. Traditional ID‑based models miss contextual and display information, creating a large information gap.

Challenges:

Missing contextual cards: each module must learn contextual signals, which collectively boost performance.

Limited compute on the estimation side: higher compute allows broader impact.

Algorithmic solutions:

Context Simulation Center (CSC) : an exposure network predicts which items are likely to be shown (thousands of candidates); a ranking network orders them using NDCG.

Context Modeling Transformer (CMT) : encodes position‑aware context with a Transformer, then decodes with candidate embeddings to produce a final context vector.

Distillation from a “real page” teacher to a “simulated page” student reduces noise.

Card reconstruction follows three steps: acquire card information, compose the user‑visible card, and match historical interests using the card.

Matrix‑based representation and patch‑level modeling capture intra‑card element interactions. Experiments show that shorter stride values and a mix of local and global patch matching improve results. A permutation‑based trick enumerates all possible patch orderings, allowing the model to learn the most informative sequence.

LLM in CTR

Large models are introduced to supplement CTR’s missing capabilities: knowledge, generalization, and reasoning. The work is divided into three layers:

Knowledge injection : prompt engineering injects external knowledge into the CTR model; post‑processing aligns high‑frequency and low‑frequency tokens.

Thinking injection : incorporates the structural reasoning ability of large models.

Paradigm iteration : uses a smaller token vocabulary (tens of thousands) to address large‑scale softmax; semantic aggregation is applied before feeding into a Transformer, improving performance on noisy inputs.

Overall, LLM techniques compensate for CTR’s lack of knowledge, generalization, and reasoning.

Summary and Outlook

Estimating feed ads seeks to uncover genuine user demand by combining industry best practices with deep business insights, exploring richer behavior patterns and more automated solutions. Gains from full‑reconstruction stem from tight algorithm‑engineering collaboration.

Large models for recommendation remain a long‑term effort; sustainable progress requires scalable input sizes, sufficient compute, and a strong hardware‑software partnership.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMuser modelingMeituanadvertising AICTR estimationfull reconstructionfeed advertising
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.