Big Data 16 min read

Building and Practicing the Performance Assurance System of YouShu BI

This article presents an in‑depth overview of the YouShu BI product, outlines the high‑concurrency performance challenges faced by enterprise BI, and details the multi‑layer performance architecture—including front‑end, back‑end, data engine, and data source layers—along with smart caching, MPP acceleration, materialized views, and the Data Doctor operations that together ensure low‑latency, reliable analytics for large‑scale users.

DataFunSummit
DataFunSummit
DataFunSummit
Building and Practicing the Performance Assurance System of YouShu BI

Introduction

The presentation shares the construction and practice of the performance assurance system for YouShu BI, a enterprise‑level intelligent big‑data analytics platform developed by NetEase.

1. YouShu BI Product Overview

YouShu BI, launched in 2014, offers both private‑deployment and SaaS versions and provides a one‑stop data analysis and application workbench with zero‑code drag‑and‑drop capabilities. It includes sub‑products such as visual analysis, data dashboards, data entry, data preparation, data portals, and complex reports.

The front‑end drawing layer uses the self‑developed NEV chart engine; the back‑end business layer is built on Node.js handling permissions, chart configuration, resource management, scheduling, and intelligent caching; the data engine layer is implemented in Java/Scala/Clojure for graph inference and query processing; the data source layer integrates external data sources and NetEase’s MPP (CK and GP).

2. Enterprise High‑Concurrency Reporting Challenges

Enterprise BI faces massive data volumes, thousands of charts, and daily concurrent accesses reaching hundreds of thousands, leading to challenges in handling peak‑hour concurrency, flexible analytical query performance, and ensuring stability for leadership dashboards.

Simply adding resources is not cost‑effective; the solution requires reducing on‑disk queries and improving their performance.

3. YouShu BI Performance System Construction and Practice

3.1 Overall Architecture

The performance system consists of three layers: chart layer (visual query with intelligent cache), data model layer (multidimensional analysis with materialized views), and data connection layer (SQL queries accelerated by MPP).

The “Data Doctor” module provides automatic chart performance diagnosis and optimization recommendations.

3.2 Smart Cache

Smart cache predicts user behavior to pre‑load frequently accessed charts, achieving near‑100% cache coverage for charts accessed in the last 30 days, over 90% first‑visit cache hit rate, and about 65% analysis cache hit rate.

3.3 High‑Performance MPP Query Engine

Data sources unsuitable for OLAP (e.g., Excel, MySQL) are extracted to MPP engines (Greenplum, ClickHouse, future Doris) with support for full, incremental, and rolling loads, providing sub‑second query responses for many enterprise customers.

3.4 Materialized Views

Cross‑engine materialized views are created at the model level to pre‑compute full tables, vertical slices, horizontal slices, or aggregated views, improving query speed for flexible analysis scenarios.

Customers such as NetEase Cloud Music, Yanxuan, and external partners have seen a 50% increase in charts rendered within 5 seconds after materialized view optimization.

4. Data Doctor Operations

Data Doctor helps administrators monitor chart availability and performance, diagnose issues (e.g., slow table scans), and apply targeted optimizations. It also supports tiered governance for key reports, including resource isolation and priority caching.

5. Future Plans

Upcoming work includes enhancing MPP multi‑cluster management, recommending materialized view creation and incremental materialization, and improving Data Doctor’s diagnostic accuracy and project‑level optimization.

6. Q&A Highlights

Cache hit rates exceed 90% for major customers; pre‑caching improves performance by at least 10% but cannot solve all analytical scenarios, which rely on materialized views. Data freshness is ensured by T+1 batch processing, with real‑time data handled via minute‑level incremental loads.

performance optimizationdata platformMaterialized ViewsMPPBISmart Caching
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.