Tag

Data Quality

1 views collected around this technical thread.

Continuous Delivery 2.0
Continuous Delivery 2.0
May 30, 2025 · Artificial Intelligence

Data Quality and Diversity: The Critical Battlefield Beyond AI Models

The article explains why high‑quality, diverse data—rather than just advanced models—has become the decisive factor for enterprise AI success, outlining key dimensions of data quality, strategies for building diverse datasets, and practical steps for establishing a data‑first AI strategy.

AIData QualityData Strategy
0 likes · 12 min read
Data Quality and Diversity: The Critical Battlefield Beyond AI Models
Ctrip Technology
Ctrip Technology
Jan 3, 2025 · Big Data

Design and Implementation of a Kafka Gatekeeper for FinOps Billing Data Quality Governance

This article describes the challenges of data quality in Ctrip’s hybrid‑cloud FinOps billing system and presents the design, implementation, and high‑availability deployment of a custom Kafka Gatekeeper proxy that performs pre‑validation, configurable rules, self‑service dashboards, and automated alerts to improve coverage, timeliness, and responsibility attribution.

Big DataData QualityFinOps
0 likes · 17 min read
Design and Implementation of a Kafka Gatekeeper for FinOps Billing Data Quality Governance
DataFunSummit
DataFunSummit
Jan 1, 2025 · Big Data

Douyin Group Data Asset Management Platform: Full‑Stack Data Lineage Evolution and Applications

This article introduces Douyin Group’s end‑to‑end data asset management platform, explains the evolution and architecture of its large‑scale data lineage system, presents quality metrics and ecosystem components, and outlines practical applications and future directions for data governance, development, and security.

Big DataData QualityDouyin
0 likes · 16 min read
Douyin Group Data Asset Management Platform: Full‑Stack Data Lineage Evolution and Applications
Model Perspective
Model Perspective
Dec 23, 2024 · Fundamentals

Mastering Mathematical Modeling: 5 Stages & Common Pitfalls to Avoid

From the excitement of first encountering mathematical modeling to becoming a seasoned practitioner, this guide outlines five progressive stages, reveals typical misconceptions at each level, and offers practical advice to help learners avoid common traps and develop both technical and soft skills.

Common PitfallsData Qualitylearning stages
0 likes · 8 min read
Mastering Mathematical Modeling: 5 Stages & Common Pitfalls to Avoid
DataFunSummit
DataFunSummit
Dec 15, 2024 · Big Data

Ant Group Data Technology’s Thoughts and Practices on Data Governance

This article shares Ant Group Data Technology’s comprehensive view on data governance, covering its concepts and framework, practical strategies such as architecture, standards, platforms and digital operations, real‑world implementations like distributed warehouses and the OneData system, and future trends involving AI and automation.

AIAutomationBig Data
0 likes · 14 min read
Ant Group Data Technology’s Thoughts and Practices on Data Governance
ByteDance Data Platform
ByteDance Data Platform
Nov 6, 2024 · Big Data

How Douyin’s Data Platform Overcomes EB‑Scale Metric Challenges

This article explains how Douyin Group tackles massive data volume, quality, and efficiency issues by building a four‑layer intelligent platform, standardizing metric management, automating metric decomposition, and creating reusable metric services that boost agility, stability, and cross‑team collaboration.

Big DataData EngineeringData Platform
0 likes · 20 min read
How Douyin’s Data Platform Overcomes EB‑Scale Metric Challenges
DataFunSummit
DataFunSummit
Sep 1, 2024 · Artificial Intelligence

Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges

This article surveys data management for large language model training, covering an overview, pre‑training data composition, scaling‑law‑driven quantity control, quality filtering, deduplication, harmful‑content removal, instruction fine‑tuning strategies, dynamic data selection, and emerging research challenges such as bias mitigation, multimodal data handling, and synthetic‑data filtering.

Artificial IntelligenceData QualityInstruction Fine-Tuning
0 likes · 18 min read
Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges
DataFunTalk
DataFunTalk
Aug 8, 2024 · Big Data

Building a User Profile Data Warehouse at 58.com: Architecture, Modeling, and Practices

This article details the design and implementation of a user‑profile data warehouse at 58.com, covering data‑warehouse fundamentals, user‑profile tag generation, layered architecture, dimensional modeling choices, ETL migration from Hive to Spark, data‑quality safeguards, and the resulting scale of tables, metrics and tags.

Big DataData QualityETL
0 likes · 20 min read
Building a User Profile Data Warehouse at 58.com: Architecture, Modeling, and Practices
DataFunSummit
DataFunSummit
Aug 7, 2024 · Big Data

Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, detailing its architecture, data quality assurance, stream‑batch integration, and future data lake implementation, while highlighting the use of Flink, ODPS, and Paimon for scalable, low‑latency analytics.

Big DataData LakeData Quality
0 likes · 15 min read
Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook
DataFunTalk
DataFunTalk
Jul 18, 2024 · Big Data

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent exploration of real-time data warehouse architecture, covering its six-module design, data quality assurance mechanisms, stream‑batch unified processing with Flink and ODPS, and a forward‑looking data lake solution built on Paimon, offering practical insights for large‑scale streaming analytics.

Big DataData LakeData Quality
0 likes · 15 min read
Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook
DataFunTalk
DataFunTalk
Jun 23, 2024 · Big Data

Building Full-Chain Data Lineage for E‑commerce Scenarios

This article explains how to construct a full‑chain data lineage system for e‑commerce, covering the concepts of data lineage, the design of a lineage foundation, quality measurement, application‑level lineage, and practical use cases such as table migration, field‑level tracing, and automated metric decomposition.

Big DataData Qualitydata governance
0 likes · 12 min read
Building Full-Chain Data Lineage for E‑commerce Scenarios
Baidu Tech Salon
Baidu Tech Salon
Jun 12, 2024 · Big Data

Event Tracking Governance: Concepts, Challenges, and Platform Solutions

Event‑tracking governance ensures accurate, consistent user‑behavior data by managing the full lifecycle of logging points through defined quality standards, a digitized workflow, and supporting tools such as rule editors, real‑time testing, and compliance monitoring, while the platform’s page‑scene tree model and metrics improve visibility, reduce duplication, and drive business insight.

Data QualityGovernanceanalytics
0 likes · 13 min read
Event Tracking Governance: Concepts, Challenges, and Platform Solutions
Baidu Geek Talk
Baidu Geek Talk
Jun 12, 2024 · Big Data

Event Tracking Governance and Logging Platform Solutions

The article explains event tracking, its data‑quality challenges, and presents a logging platform that enforces quality standards, an end‑to‑end online workflow, and specialized design, testing, and validation tools—including extended field types—to govern, monitor, and improve tracking point compliance across applications.

Data QualityGovernanceMetrics
0 likes · 13 min read
Event Tracking Governance and Logging Platform Solutions
DataFunTalk
DataFunTalk
Jun 1, 2024 · Big Data

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, covering the system architecture, streaming data quality assurance, flow‑batch integrated applications, and future data lake integration, while sharing technical details and operational insights for large‑scale data processing.

Data LakeData QualityFlink
0 likes · 16 min read
Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook
Beijing SF i-TECH City Technology Team
Beijing SF i-TECH City Technology Team
May 30, 2024 · Big Data

Data Lineage System Design and Implementation for Big Data Platforms

This article presents a comprehensive data lineage system (Data-Lineage) for big data platforms, addressing challenges in heterogeneous data sources, multiple execution engines, and complex dependencies through hook-based architecture and modular design.

Big Data ArchitectureData QualitySQL parsing
0 likes · 12 min read
Data Lineage System Design and Implementation for Big Data Platforms
DataFunTalk
DataFunTalk
May 25, 2024 · Big Data

Data Quality Governance: From Compliance to Reasonableness and the Quality Review Tool System

This article explains how to assess and improve data quality by moving from simple compliance checks to deeper reasonableness analysis, using visual dashboards, a comprehensive quality‑review tool suite, intelligent judgement rules, self‑diagnosis utilities, and key technical components such as sample libraries and a three‑layer architecture.

Big DataData Qualitydata governance
0 likes · 25 min read
Data Quality Governance: From Compliance to Reasonableness and the Quality Review Tool System
DataFunSummit
DataFunSummit
May 21, 2024 · Operations

Bilibili Data Governance Operational Framework Practice

This article presents Bilibili's practical data governance operational framework, introducing the DAMA‑Bok methodology, detailing two real‑world cases on storage‑level risk and data‑loss post‑mortem, and outlining the organizational, metadata, and embedded governance mechanisms that drive cost and quality improvements.

DAMA-BokData Qualitycost governance
0 likes · 19 min read
Bilibili Data Governance Operational Framework Practice
DataFunSummit
DataFunSummit
May 3, 2024 · Big Data

Comprehensive Guide to Enterprise Data Governance: Vision, Framework, Organization, Standards, Quality, and Security

This article presents a detailed overview of enterprise data governance, covering its vision and goals, three‑layer framework, organizational structure, institutional policies, data standards, quality management, metadata handling, security controls, lifecycle protection, and practical implementation cases.

Big DataData QualityData Security
0 likes · 14 min read
Comprehensive Guide to Enterprise Data Governance: Vision, Framework, Organization, Standards, Quality, and Security
DataFunTalk
DataFunTalk
Apr 28, 2024 · Big Data

Ant Group’s Data Governance Practices: Overview, Data Quality, and Data Storage Governance

This article shares Ant Group's extensive experience in big data governance, detailing the overall data governance framework, data quality management, data storage governance, and future considerations, illustrated with practical cases and strategies for ensuring compliance, reliability, and cost efficiency.

Ant GroupBig DataData Architecture
0 likes · 17 min read
Ant Group’s Data Governance Practices: Overview, Data Quality, and Data Storage Governance