Reflections on Data Governance Challenges and Approaches
The author shares a candid account of transitioning from a non‑data role to confronting data‑centric bottlenecks, describing the current state of data projects, common pitfalls, and practical thoughts on simplifying data governance within limited resources and budget constraints.
Familiar readers may know that the author previously worked in non‑data roles and only became a "data person" in August 2016 when the company’s products hit performance bottlenecks using traditional relational databases and open‑source data‑warehouse solutions.
Starting from zero, the author built a Hadoop+Hive+Spark‑SQL+Elasticsearch cluster on modest PC servers, overcoming issues of "cannot run", "cannot finish", and other pain points, achieving a modest success.
Over the years, data projects were often titled "XXX Data Quality Check", "XXX Data Analysis Platform", or "XXX Big Data Project" and later shifted to "XXX Data Governance Project". Regardless of naming, the work typically involved data collection, cleaning, processing, quality, modeling, mining, analysis, sharing, application, and visualization—mostly short‑cycle MVPs that rarely address long‑term value, leading to repeated re‑work.
Resource and budget constraints limit deep strategic planning such as data management strategy, framework, culture, organization, lifecycle, metadata, master data, reference data, and security. While many data‑governance products are based on the DAMA‑DMBOK2 knowledge framework, the author questions whether a single system can fully solve governance challenges.
Current problems are identified from four perspectives:
Data source issues: Standards exist but are hard to enforce; external sources are uncontrollable and quality is unpredictable.
Data processing issues: Lack of standards and processes leads to ad‑hoc handling, with quality problems discovered only after they occur.
Data usage issues: Delivered data often fails to meet expectations, either inaccurate or unsuitable, and downstream requirements are hard to confirm.
Other issues: Tight timelines, heavy workloads, insufficient stakeholder support, and the difficulty of gaining recognition for “dirty work” versus visible dashboards.
The author emphasizes that tools alone are insufficient; people are the most critical factor in data‑governance projects.
Proposed simplification approach: start with a small team, gradually clarify and explore data without launching large organizational changes or committees. If needed, consider a pure consulting project, but remain skeptical of external consultants.
Address root causes at the source: many problems stem from upstream system design flaws or bugs; fixing those directly is more effective than extensive governance processes.
Finally, the author invites readers to discuss the necessity of data‑governance, the balance between proactive planning and pragmatic implementation, and how to improve product, consulting, and implementation projects.
Images illustrating the DAMA‑DMBOK2 framework and its evolution are included throughout the article.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.