Tag

dimensional aggregation

2 views collected around this technical thread.

Baidu Tech Salon
Baidu Tech Salon
Nov 20, 2024 · Big Data

Optimizing Multi‑Dimensional User Count Computation in Feed Using Data Tagging

By deduplicating logs and assigning compact numeric tags to each user‑dimension combination, the data‑tagging method replaces costly lateral‑view expansions with a user‑level aggregation, cutting shuffle volume from terabytes to gigabytes and reducing runtime from 49 minutes to 14 minutes, enabling scalable multi‑dimensional user‑count analysis for Baidu Feed.

Big DataHivePerformance Tuning
0 likes · 14 min read
Optimizing Multi‑Dimensional User Count Computation in Feed Using Data Tagging