Big Data 13 min read

Unlocking Real‑Time Data Quality: ByteDance’s Dynamic Exploration Solution

This article explains how ByteDance’s dynamic data exploration tool improves data quality assurance by replacing time‑consuming SQL validation with real‑time, sample‑based profiling, detailing its problem background, core features, technical architecture, front‑end rendering techniques, operation‑stack management, and future enhancements.

ByteDance Data Platform

Jul 18, 2022

Unlocking Real‑Time Data Quality: ByteDance’s Dynamic Exploration Solution

Data exploration is a crucial step for ensuring data quality and forms the foundation of data development; without it, projects face repeated issues, operational difficulties, and extended timelines.

Problem Background

Traditional data validation relies on writing SQL queries, which is time‑consuming, resource‑intensive, and does not provide row‑level details or seamless integration with quality monitoring.

The main pain points are:

Inability to view detailed data rows and perform preprocessing.

Resource scheduling leads to minute‑level wait times.

Lack of integration with data quality monitoring, making downstream usage unclear.

Dynamic Exploration Solution

ByteDance’s dynamic exploration addresses these issues by offering:

Big‑data preview‑based profiling that supports function‑level preprocessing.

Second‑level updates of exploration results for real‑time response.

Integration with data monitoring and automatic SQL generation.

Application Scenarios

The solution is used in metadata management, data R&D, data‑warehouse development, and data governance, serving both SQL‑centric developers and non‑SQL users such as modelers and data miners.

It closes three loops:

Metadata Management → Exploration → Data Preview (quality report).

Data Monitoring ↔ Exploration.

Dynamic Exploration → SQL → Data Development → Debug → Exploration Report.

Terminology

Full‑table Exploration: Executes on the backend and shows statistical distribution for all columns. Dynamic Exploration: Samples a subset of data, displays field details, allows front‑end preprocessing, and updates statistics in real time.

Technical Implementation

Most of the logic runs on the front end, while sampling is performed on the backend.

Sampling Capability

Currently uses random sampling; future work will explore feature‑based sampling.

Big‑Data Rendering

The front end must render up to 5,000 rows, handling both exploration cards and data preview tables.

Exploration cards summarize key column metrics (e.g., zero values, nulls, enumerations) and are rendered with a virtual list to support collapse/expand states.

Data preview uses an internal canvas‑based table for high‑performance scrolling.

Card Linking

To align cards with the data preview columns, an automatic positioning feature calculates the midpoint of a selected card and scrolls the table to keep the view centered.

Operation Stack

Each user action (e.g., column deletion, filtering, sorting) is recorded as an operation; a stack of operations can be edited, replayed, and the results are updated in real time.

The operation engine abstracts each operation as Input + Logic = Output. For example, a column‑deletion operation runs a method that filters out specified columns and returns the updated column list and data map.

class ColDelOpt {
  run = (params: IOptEngineMetaInfo) => {
    const { columns = [], dataSourceMap = {} } = params;
    const { fields = [] } = this.params;
    const nextColumns = columns.filter(item => !fields.includes(item.name));
    return { columns: nextColumns, dataSourceMap };
  }
}

The engine iterates over the operation list, applying each operation sequentially and handling errors gracefully.

class OptEngine {
  private optList: IOptEngineItem[] = [];
  private metaData: IOptEngineMetaInfo = { columns: [], dataSourceMap: {} };

  optRun = () => {
    let { columns, dataSourceMap } = this.metaData;
    if (!this.optList.length) return { columns, dataSourceMap };
    for (let i = 0; i < this.optList.length; i++) {
      const optItem = this.optList[i];
      try {
        const result = optItem.run({ columns, dataSourceMap });
        columns = result.columns || [];
        dataSourceMap = result.dataSourceMap || {};
      } catch (e) {
        return { columns, dataSourceMap, errorInfo: { key: optItem.key || '', message: e.message } };
      }
    }
    return { columns, dataSourceMap };
  }
}

Practical Example

During front‑end development, a team needed to locate users of a specific vertical‑screen device (1080×1920). Using dynamic exploration, they quickly filtered and visualized the relevant data distribution.

Future Plans

Support more exploration types (e.g., map, JSON, time, SQL) and richer chart visualizations.

Introduce an editor‑style operation stack with HSQL support and multi‑table joins.

Complete SQL generation from operation flows, leveraging lexical analysis and AST techniques.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data SQL Generation frontend rendering data exploration dynamic profiling operation stack

Written by

ByteDance Data Platform

The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.