Big Data 9 min read

Building MVP: A Lightweight Big Data Analysis System for Product Growth

The article describes how a lightweight big‑data analysis platform called MVP was built from scratch—using a User‑Event‑Config model, HDFS + ClickHouse + Spark, and four modules for metric monitoring, root‑cause alerts, deep growth analysis, and A/B testing—enabling real‑time insights in seconds instead of days and dramatically accelerating product‑growth operations.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Building MVP: A Lightweight Big Data Analysis System for Product Growth

In the era of refined product operations, product growth challenges frequently arise, including metric fluctuation analysis, version iteration effectiveness evaluation, and operational activity impact assessment. These analytical needs are high-frequency and time-sensitive, but traditional data analysis methods struggle to meet demands due to limited human resources. This article presents a solution: building a lightweight big data analysis system called MVP (Minimum Viable Product to Most Valuable Product) from scratch.

The MVP system consists of four modules: Product Business-Operation Metrics (using AARRR model for growth indicator analysis), Indicator Anomaly-Root Cause Warning (monitoring growth metric fluctuations and providing root cause clues), Analysis Tools-Growth Analysis (deep user behavior analysis), and AB-Test Experiment Evaluation (experimenting and evaluating business decision rationality).

Technical implementation covers three aspects: Data Modeling, Technology Selection, and Page Interaction. For data modeling, the system uses "User + Event ID + Config" approach to abstract and integrate product data, generating a user wide table based on Key-Value model where one User_ID has only one record. For technology selection, after comparing Druid, Impala, ClickHouse, and Spark, the team chose HDFS + ClickHouse + Spark solution. ClickHouse demonstrated superior performance: 2x faster than Presto, 3x faster than Impala, and 4x faster than SparkSQL. Actual tests on 220 million+ records (1.79GB) showed single table aggregation in 0.095s with analysis speed of 18.95GB/s. Spark complements ClickHouse's limitation in large-scale Join operations.

For page interaction, 80% of analysis requests are completed through real-time page analysis (returning results in seconds), while the remaining 20% complex tasks are completed through job submission (5-15 minutes). The system handles business metrics, event model analysis, funnel model analysis, and retention model analysis via real-time analysis, while user persona insights and interest preference analysis are done via job submission.

Application results show significant improvement: traditional analysis required 3-5 days following the standard process, while MVP completes analysis requests quickly, greatly shortening project timelines. Currently, the system has processed 15,000+ analysis tasks, with peaks exceeding 2,000 tasks. The system has transformed manual data analysis to tool-based analysis, significantly improving efficiency and enabling data-driven refined product operations.

Big Datareal-time analyticsData ModelingClickHouseMVPHDFSSparkAARRR ModelOLAP analysisproduct growth
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.