Big Data 13 min read

Materialized Views in MaxCompute: Design, Implementation, and Best Practices

This article explains the concept, advantages, and drawbacks of materialized views, describes how MaxCompute implements them—including creation syntax, maintenance properties, automatic query rewrite, smart recommendation, and auto‑materialization—and shares performance results and future improvement plans.

DataFunSummit
DataFunSummit
DataFunSummit
Materialized Views in MaxCompute: Design, Implementation, and Best Practices

01 MaxCompute and Materialized Views

MaxCompute is a high‑performance, enterprise‑grade data‑warehouse service that has served Alibaba for over 14 years, handling billions of daily DML tasks and exabyte‑scale data. To reduce repeated computation, meet tighter latency requirements, and lower data‑governance costs, materialized views (MVs) are introduced as a query accelerator.

Unlike virtual views, a materialized view stores a physical copy of the query result, allowing fast reuse. The article outlines the evolution of MVs from early database systems to cloud‑native big‑data platforms.

Advantages of MVs

Improved query performance by reading pre‑computed results.

Data consistency through periodic refreshes.

Emergency query capability when the database is unavailable.

Drawbacks of MVs

Additional storage consumption.

Refresh overhead for frequently changing source data.

Increased complexity for certain queries.

02 Design and Implementation

MaxCompute supports ordinary, partitioned, clustered, and transparent materialized views. Creation is expressed via DDL, e.g.:

CREATE MATERIALIZED VIEW mv_name PARTITIONED BY (col) CLUSTERED BY (col) SORTED BY (col ASC) INTO 10 BUCKETS AS SELECT ...

Two key points during creation are: (1) building an index from the base table to the MV for fast lookup, and (2) storing the base table’s data version in MV metadata. Partitioned MVs must keep partition columns consistent with the base table to enable incremental updates.

Maintenance properties are set via TBLPROPERTIES in the DDL, for example:

"enable_auto_substitute"="true"

"enable_auto_refresh"="true"

"refresh_interval_minutes"="120"

"only_refresh_max_pt"="true"

03 Automatic Query Rewrite and Smart Recommendation

The optimizer automatically rewrites queries to use MVs through four steps: index lookup, hit pre‑judgment, MV registration, and rewrite validation. The rewrite handles differences in SELECT list order, missing columns, and compensates missing calculations.

Example original query:

SELECT ds, count(uid) as pv FROM users WHERE ds>='20230501' AND ds<='20230507' GROUP BY ds;

Corresponding MV:

SELECT ds, count(uid) as pv FROM users WHERE ds>='20230401' GROUP BY ds;

Rewritten query using the MV:

SELECT ds, pv FROM mv WHERE ds>='20230501' AND ds<='20230507';

Smart recommendation analyzes heavy, long‑running, high‑frequency SQL tasks, extracts common sub‑expressions, and suggests or automatically creates MVs based on match degree, table overlap, and storage size.

04 Automatic Materialization

Key techniques include online materialization (delayed computation until query time), quota limits to prevent uncontrolled MV creation, and feedback‑driven iteration to retain only valuable MVs. Users can manage MVs via the console.

05 Summary and Outlook

MaxCompute’s MV capabilities cover partitioning, clustering, query rewrite, delayed materialization, transparent access, auto‑generation, and auto‑refresh. In the Taobao business line, over 5,000 MVs have optimized more than 20,000 queries, reducing average resource consumption by over 20% and up to 80% for some workloads. Future work includes finer‑grained incremental updates, event‑driven refreshes, external table support, and extending rewrite operators for outer joins, GROUP BY, HAVING, ORDER BY, LIMIT, and aggregate functions.

Big DataSQL OptimizationMaxComputeQuery Accelerationmaterialized viewAutomatic Refresh
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.