Databases 17 min read

Data Warehouse Fundamentals, Modeling Techniques, and the Evolution of Maoyan’s Warehouse

This article explains the origins and challenges of scattered enterprise data, defines the data warehouse concept, details its four core characteristics, compares entity, normalization, and dimensional modeling methods, and illustrates Maoyan’s three‑stage data‑warehouse evolution with practical examples and diagrams.

Architecture Digest
Architecture Digest
Architecture Digest
Data Warehouse Fundamentals, Modeling Techniques, and the Evolution of Maoyan’s Warehouse

In the early days of data warehousing, enterprises like Maoyan stored daily reports and data interfaces in disparate departmental systems, creating a complex web of data sources that were difficult to manage across business units.

Data sources are diverse and decision‑making data is fragmented.

Lack of standards makes integration hard.

Inconsistent data definitions reduce credibility.

Absence of data governance leads to poor data quality.

Extracting data from such raw storage eventually results in credibility loss, increasing data dispersion, and the erosion of standardized accuracy.

Definition of Data Warehouse

Bill Inmon’s widely accepted definition describes a data warehouse as a subject‑oriented, integrated, non‑volatile, time‑variant collection of data that supports decision‑making.

Subject‑oriented : Data is organized around business subjects (e.g., users, discounts, sales) rather than transactional processes.

Integrated : Data from heterogeneous sources is extracted, cleaned, transformed, and loaded with unified coding and summarization to eliminate redundancy.

Stable (non‑volatile) : Once loaded, data is retained for historical analysis and rarely updated, serving long‑term decision support.

Time‑variant : Historical timestamps allow trend analysis and forecasting.

Data Warehouse Modeling Methods

Entity Modeling

Entity modeling, derived from philosophy, treats business as a collection of entities, events, and descriptions. Any business fact can be broken into these three parts, making the method useful for early‑stage domain modeling.

For example, in the statement “Xiao Ming drives to school,” “Xiao Ming” and “school” are entities, “going to school” is an event, and “driving” is a description.

Entity: a specific concept or object involved in business.

Event: a process that connects entities.

Description: additional details about entities or events.

Entity modeling excels in business and domain modeling but is limited to those phases; logical and physical modeling later rely on normalization and dimensional techniques.

Normalization (Third‑Normal‑Form) Modeling

Promoted by Inmon, this method follows relational database theory, ensuring each attribute is atomic, fully dependent on the whole primary key, and free of transitive dependencies. It aligns data‑warehouse models with the enterprise’s subject‑area and logical models.

Each attribute value is unique and unambiguous.

Non‑key attributes fully depend on the entire primary key.

Non‑key attributes do not depend on attributes of other tables.

While this approach integrates well with relational systems, it can restrict flexibility and performance when aggregating data for data‑mart layers.

Dimensional Modeling

Introduced by Kimball, dimensional modeling separates data into fact tables and dimension tables, forming star or snowflake schemas. It pre‑processes dimensions (e.g., time, region, product) to accelerate analytical queries.

Dimension tables describe attributes of business entities; fact tables store measurable events.

Star schema: each dimension directly links to the fact table.

Snowflake schema: dimensions are normalized, reducing redundancy but potentially adding joins.

Typical dimensional design steps: define the analysis subject, select dimensions, decide grain of fact rows, and choose measurable metrics.

Example: analyzing recruitment for each McDonald’s store using dimensions such as time, region, brand, store, position, and channel, with metrics like applications, hires, and rejections.

Physical Layer Hierarchy in Data Warehousing

Physical modeling often splits data into three layers: detailed fact (ODS‑level), light aggregation, and topic‑wide tables.

Detail layer: fine‑grained, cleaned data similar to ODS.

Aggregation layer: business‑oriented summaries.

Topic‑wide layer: integrated views built on aggregation data.

Dimension tables: sourced directly from business or derived from facts.

Maoyan’s Data Warehouse Development History

Stage 1 (2015)

Initial warehouse focused on rapid business needs, using isolated “chimney‑style” development for each requirement, leading to high operational cost, frequent schema changes, and data inconsistency.

Stage 2

After recognizing Stage 1’s drawbacks, Maoyan built an offline warehouse organized by business themes (e.g., seat selection, discount cards, coupons), improving data reuse and consistency across similar business lines.

Stage 3

Model‑driven development addressed naming conflicts, data silos, and unified source data by constructing unified order, UGC, payment, and finance models, standardizing data definitions and export interfaces.

Comparison of the Three Stages

Maoyan Technical Team https://juejin.im/post/5d5be159f265da03970bbc3a

Recommended Reading

Tencent Data Director Tutorial: Build a Big Data Knowledge System in 5 Steps

Dada O2O Backend Architecture Evolution: From 0 to 4000 Concurrent Requests

iQIYI’s Database Selection Guide

Big DatamodelingData WarehouseETLDatabase Designdimensional modeling
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.