Big Data 16 min read

Data Map: Background, Definition, and Youzan’s Practical Implementation

This article introduces the concept of a data map, explains its background and goals, describes Youzan’s end‑to‑end data‑map practice—including full data lineage, search, management, link analysis, impact estimation, and optimization—and concludes with a summary and future outlook.

DataFunSummit
DataFunSummit
DataFunSummit
Data Map: Background, Definition, and Youzan’s Practical Implementation

01 Data Map Background

Every enterprise deals with data collection, development, search, management, analysis, and troubleshooting, often encountering pain points such as unclear data flow, difficulty locating data, inefficient data management, slow fault diagnosis, and complex link optimization. The data map was created to address these issues.

02 Data Map Goals

Enable users to quickly find the data they need.

Provide comprehensive data lineage visibility.

Facilitate efficient data management.

Improve fault‑handling speed by assessing impact and recovery time.

Offer different link perspectives for various user scenarios.

03 What Is a Data Map?

A data map combines the familiar capabilities of geographic maps (search, routing, surrounding information, and management) with the unique characteristics of data (variety, flow, and long lifecycle). It provides data search, efficient management, and lineage analysis such as blood‑lineage viewing and key‑path tracing.

04 Youzan’s Data‑Map Practice

The practice is divided into four parts:

Full Data Linkage – Covers data types, task types, platforms, metadata (basic, technical, trend, lineage) and builds a complete lineage including table‑to‑table, table‑to‑task, and field lineage. Abstracted tables and tasks enable a closed‑loop from business to business.

Data Search – After data collection and linkage, a scoring system ranks results based on ownership, downstream count, quality, access frequency, and other factors, ensuring the most relevant data appears first.

Data Management – Uses “data albums” (similar to music albums) to categorize data across dimensions, allowing structured management, collaboration, multi‑dimensional classification, batch permission handling, and table splitting.

Link Analysis Bloodline viewing with shortcuts for top‑upstream/downstream, aggregation by database/owner/type, node search, and alphabetical sorting. Abnormal analysis that prunes large graphs to highlight critical fault paths, using a two‑step pruning algorithm to reduce nodes from thousands to dozens. Impact analysis & production‑time estimation that predicts completion times based on historical median runtimes, current status, and upstream dependencies. Link optimization that identifies critical paths, adjusts upstream task start times, or replaces slow tables based on field lineage. Data monitoring and assurance via scheduled scans of task syntax, input table existence, and field usage, plus manual triggers for safe table/field replacements.

05 Summary and Outlook

Summary – Improved UI responsiveness (99% of APIs < 1 s), increased UV from 90 to 130 and PV from 2 k to 3.5 k, saved 1–3 h of work per incident, and now supports 29 data types, 16 task types, and over four platforms.

Outlook Re‑architect storage from relational databases to graph databases for faster, more accurate lineage reasoning. Expand scenarios such as cost reduction, quality and stability optimization, and batch quality assurance via data albums. Introduce richer visual components for data and business model visualization, enabling intuitive understanding of processes and schemas.

Q&A

Q: How is field‑level lineage parsed in Youzan? A: The data‑foundation team parses it, and the data‑map team retrieves it via offline services.

Q: Is the “night‑shift rate” of data engineers a formal KPI? A: It is a self‑defined metric to measure the effectiveness of data governance.

Q: How do platform users trace errors back through lineage? A: Data lineage spans four platforms (offline, real‑time, BI, metric library), enabling rapid impact assessment and coordinated recovery.

Thank you for reading, and please like, share, and give a three‑click boost!

big datadata lineagedata managementdata governanceFault DiagnosisData Map
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.