Databases 18 min read

Practical Use Cases of Materialized Views and Indexes in Doris

This article shares practical experiences with Doris, covering materialized view concepts, typical use cases, advantages, creation syntax, prefix index principles, performance‑boosting scenarios such as order analysis, PV/UV counting, detail queries, and operational tips for high‑throughput and low‑latency workloads.

DataFunTalk
DataFunTalk
DataFunTalk
Practical Use Cases of Materialized Views and Indexes in Doris

Introduction

The article presents real‑world usage of Doris, focusing on materialized views (formerly called rollup) and built‑in indexes, illustrating typical scenarios, benefits, and practical tips.

1. Materialized View Basics

A materialized view in Doris is a special physical table that stores pre‑computed results. It involves two key ideas: pre‑computation of aggregates and the fact that the view occupies actual storage.

Typical use cases include:

When both detailed and fixed‑dimension queries are needed (e.g., order details plus aggregation by category and shop).

Queries that involve only a small subset of rows or columns.

Heavy aggregation queries such as SUM or DISTINCT.

Scenarios that benefit from prefix indexes.

Advantages of Materialized Views

Significant query speed improvement for ad‑hoc queries.

Strong data consistency: writes to the base table and the materialized view are performed in a single transaction, ensuring the view is always up‑to‑date.

Support for common aggregate functions (SUM, MIN, MAX, COUNT) and specialized functions like bitmap_union and HLL_UNION for large‑scale PV/UV analysis.

2. Materialized View Creation

Typical CREATE statement:

CREATE MATERIALIZED VIEW k1_k2_sumk3 AS SELECT k1, k2, SUM(k3) FROM duplicate_table GROUP BY k1, k2;

Another example for deduplication:

CREATE MATERIALIZED VIEW deduplicate AS SELECT k1, k2, k3, k4 FROM duplicate_table GROUP BY k1, k2, k3, k4;

3. Doris Index Principles

Doris stores data in an SSTable‑like structure sorted by key columns. It provides a built‑in prefix index that stores the first 36 bytes of each row (or the full VARCHAR column if shorter). This enables fast lookups when the query predicates match the indexed prefix.

Example queries:

SELECT * FROM table WHERE user_id=1829239 AND age=20;
SELECT * FROM table WHERE age=20;

The first query benefits from the prefix index, while the second does not.

4. Application Cases

Case 1 – Order Analysis : Create a materialized view that groups by warehouse ID and sums sales amount. Querying per warehouse then scans only the view, reducing data scanned by orders of magnitude and achieving millisecond‑level response.

Case 2 – PV/UV Counting : Build a view that groups by advertiser and channel and uses bitmap_union_count(to_bitmap(user_id)) for precise UV. Queries are redirected to the view, dramatically cutting execution time.

Case 3 – Detail Query : Reorder columns in a view so that a frequently queried column becomes the leading key, allowing the prefix index to be used even when only that column is filtered.

CREATE MATERIALIZED VIEW mv_1 AS SELECT k3, k2, k1 FROM tableA ORDER BY k3;

5. Operational Insights

Control write task frequency to avoid excessive compaction that hurts IO and query performance.

Adjust compaction thresholds in high‑traffic scenarios.

Limit the number of materialized views in high‑throughput workloads to balance storage cost and write speed.

Tune tablet write speeds and choose performant disks (e.g., SSDs) to handle large ingest volumes.

For low‑latency needs, increase write concurrency and reduce write intervals, while being aware of small‑file proliferation.

When using Doris for high QPS workloads, leverage materialized views (rollup) and consider adding observer nodes for metadata read scaling.

6. Q&A Highlights

• Creating a view for every dimension leads to view explosion; aim for views that cover multiple query patterns. • Doris supports micro‑batch processing; true streaming (one‑row‑per‑second) is not feasible. • Open‑source Doris performs well for medium‑size workloads; the commercial edition adds vectorized execution and some stability improvements. • For colocated joins, distribute data evenly and be cautious during cluster scaling.

Overall, materialized views and prefix indexes are powerful tools in Doris for accelerating both aggregation and detail queries, provided that storage cost and write overhead are carefully managed.

Performance OptimizationBig DataOLAPIndexmaterialized viewDoris
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.