Big Data 13 min read

Data Quality Governance in the IData Data Platform: Concepts, Metrics, and Implementation

This article explains the background and importance of data governance, defines data quality and its key metrics, describes the monitoring and alerting modules of the IData data platform, outlines built‑in and custom rule templates, and discusses baseline management and future enhancements.

政采云技术
政采云技术
政采云技术
Data Quality Governance in the IData Data Platform: Concepts, Metrics, and Implementation

Preface

We have previously introduced the background of data middle platforms and why they are needed, as well as the implementation of the data middle platform in Zhengcai Cloud and the indicator library. This article dives deeper into data governance, focusing on data quality and its role in the current Zhengcai Cloud data middle platform.

Background of Data Governance

As business development expands, the volume of stored data and online jobs grows dramatically, leading to large amounts of redundant data that waste cluster resources and hinder core job performance. Consequently, data governance becomes increasingly critical.

The core components of data governance are:

Data Quality : accuracy, compliance, timeliness, effectiveness, development standards, and metric consistency.

Data Security : compliance and protection of sensitive fields.

Standardization : legal compliance during development and data exposure.

R&D Efficiency : improving development efficiency and reducing labor costs.

Cost Control : storage, compute, and labor costs.

This article concentrates on data quality and its implementation in the IData platform.

Causes of Data Quality Issues

Data quality problems can be grouped into four categories:

Requirements : poor management and processes during design, development, testing, and release.

Data Sources : inherent issues in upstream data that surface downstream.

Statistical Definitions : inconsistent metric definitions across departments.

Data Platform Issues : problems in development, operation, or scheduling.

Common Data Quality Metrics

Data quality is evaluated using five metric groups:

Normativity : conformity to standards and detection of anomalies.

Completeness : presence or absence of missing records.

Accuracy : correctness of data values.

Consistency : uniformity of data across systems.

Timeliness : promptness of data production.

IData Data Quality Architecture

The IData data quality module consists of a monitoring component and an alerting component. Monitoring gathers basic metadata of big‑data assets, while alerts are triggered based on configured rules.

Monitoring Management

The monitoring management page allows configuration of quality rules for tables in the data warehouse. Monitoring and alerting should be decoupled; the produced quality metrics serve as metadata that can be used for efficiency analysis, scale changes, and metadata evolution.

Configuration of Monitoring and Alert Rules

Each job outputs a single table, enabling rule configuration at table and column levels.

Both default and custom rules are supported.

Rules can be disabled without deletion, offering flexibility.

Monitoring and Alert Logs

Logs display monitoring and alert status for all jobs, including current and historical records. Each log entry contains the data generation timestamp, quality metric values, alert flag, and alert level.

Rule Library

The rule library provides built‑in templates and user‑defined rules. Produced metrics fall into three categories: completeness, accuracy, and timeliness.

Built‑in Rules

Nine built‑in rules are classified as follows:

Completeness

Table row count fluctuation: triggers an alert when row count change exceeds a configured range.

Table row count reduction: alerts when the latest row count is lower than the previous one.

Field enumeration content: alerts when a field contains values outside the allowed set.

Field enumeration count: alerts when the number of distinct values exceeds the configured limit.

Table non‑empty: alerts when the latest row count is zero.

Accuracy

Table primary key uniqueness: alerts when primary key uniqueness check fails.

Field value range: alerts when any value falls outside the configured min‑max range.

Field non‑null count: alerts when the number of null or empty values exceeds a threshold.

Timeliness

Table production time: alerts when a table is not produced by its expected deadline.

Custom Rules

Users can define custom SQL statements to collect quality metrics and configure corresponding alert rules.

Baseline Management

Baseline management applies a unified set of rules to core foundational tables across the data platform, ensuring consistent quality standards.

Conclusion

Data governance is a vital part of a big‑data ecosystem. Data quality, as a core component, has become increasingly important in the Zhengcai Cloud data middle platform. All teams that generate data should participate in ensuring accuracy and effectiveness, as data quality depends on source‑side governance.

Existing teams already rely on IData’s quality capabilities to trace source accuracy. Future work will extend governance to include data security, standardization, R&D efficiency, and cost control, further enhancing the data middle platform.

monitoringbig datadata qualitydata platformData Governancerules
政采云技术
Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.