Design and Implementation of Materialized Views in YouShu BI for Performance Optimization
This article presents a comprehensive overview of YouShu BI's materialized view product, detailing performance pain points of traditional BI, the fundamentals and architecture of database materialized views, the product's design, ETL generation, query rewriting, scheduling, intelligent recommendation, and future development directions.
The article begins by introducing YouShu BI and the performance issues commonly encountered in BI tools, such as slow multi‑table joins, full‑table scans for partial queries, and varying filter requirements that degrade query speed.
It then explains the basic principles of database materialized views, describing their types (single‑table vs. multi‑table, detail vs. aggregate, full vs. incremental) and providing a simple example that illustrates how a materialized view pre‑computes and stores query results for faster access.
Next, the design of YouShu BI's materialized view product is detailed, including the configuration entry in the model UI, the configuration page where users can select fields, enable aggregation, and set filters, as well as the execution‑plan page for defining build methods, schedules, and dependencies.
The article outlines the overall architecture, showing how data sources (Excel, MySQL, Spark, etc.) feed into data models, which are then materialized into an MPP warehouse (primarily ClickHouse) via ETL pipelines that generate extraction SQL and manage incremental or full materialization.
It describes the ETL generation process, transforming materialized view and model configurations into ETL nodes (join, cleanse, aggregate, output) and deriving metadata for view creation.
The query rewrite flow is explained: the BI layer parses user queries into an AST, matches them against materialized view configurations through field, predicate, and join validation, and rewrites the query to use the appropriate materialized view, applying model, field, and equivalence optimizations.
Scheduling and extraction management are covered, highlighting support for various data sources, Spark‑based high‑performance transfer, error monitoring, manual or timed scheduling, capacity management, status tracking, and historical execution metrics.
Finally, the article discusses intelligent recommendation of materialized views based on model usage and historical hit rates, and outlines future enhancements such as richer configuration capabilities, more precise hit validation, smarter recommendation engines, and faster query performance.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.