Building and Implementing a Big Data Platform: From Scripts to Services and Lambda Architecture
This article outlines the step‑by‑step approach to constructing a big data platform—starting with script toolization, evolving through tool services, platformization, and productization, comparing business‑scenario and generic‑component construction methods, and detailing the Lambda architecture for data collection, processing, and visualization to drive business operations.
Big Data Platform Construction Ideas
As enterprises grow, they inevitably accumulate data and seek to let "data speak" by collecting, storing, analyzing, and computing it into valuable business insights. Building a big data platform addresses this need, and the article explores the thinking, pathways, industry‑standard architectures, and ways to analyze and present data.
Script Toolization
In the early stages of data collection and analysis, developers write scripts to meet specific business requirements. While scripts solve immediate problems, they often become ad‑hoc, hard to maintain, and lead to duplicated effort.
Tool Serviceization
To reduce maintenance cost and improve reusability, scripts are packaged as command‑line or UI tools. This abstraction captures experience, makes tools more robust, and raises efficiency.
Service Platformization
Tools are further exposed as cloud services, allowing users to access data processing capabilities from anywhere with network connectivity, breaking geographic constraints.
Platform Productization
When services are aggregated into a unified platform, they form a SaaS‑style product that integrates data, services, and customer needs, providing a standardized solution for various industries.
Construction Paths
Enterprises can adopt two main approaches based on scale and maturity:
Business‑Scenario Construction
Close alignment with specific business logic, enabling rapid, tailored solutions.
Developers and business users collaborate closely, ensuring high usability.
Limited extensibility; risk of duplicated effort across scenarios.
Generic‑Component Construction
Extract common functions (data ingestion, storage, computation, search, visualization) as reusable components.
Facilitates long‑term expansion across multiple business scenarios and industries.
Higher architectural complexity and longer development cycles.
For startups, the business‑scenario path is recommended to iterate quickly; mature organizations can transition to the generic‑component approach.
Implementation Architecture
The Lambda Architecture, introduced by Twitter engineer Nathan Marz, combines batch and speed layers to provide both comprehensive and low‑latency views of data.
The architecture consists of three layers:
Batch Layer : Stores the master dataset and pre‑computed views; updates are performed in scheduled batches.
Speed Layer : Processes real‑time data in memory for low‑latency results, later reconciled with batch outputs.
Serving Layer : Exposes data to end users via reports, dashboards, or APIs.
Data Flow
Data Collection
Data is gathered from browsers, mobile devices, server logs, etc., and ingested using tools such as Sqoop, Flume, or Kafka after format conversion.
Data Processing
Collected data resides in distributed storage (e.g., HDFS) and is processed with MapReduce, Hive, Spark, or Storm for both offline (batch) and online (stream) computations.
Data Output & Visualization
Processed results are served to applications, dashboards, or APIs. Different user groups (operational, managerial, executive) receive tailored aggregations and visualizations.
Data Visualization Platform Practices
User Management & Permissions
Role‑based access control and permission management.
Business grouping to segment users by department or function.
Security levels tied to data sensitivity and workflow.
Support for raw data search and browsing.
Diverse Product Functions
Multiple chart and report types.
Customizable fields and filters for each visualization.
Organizational and personal views for different perspectives.
Integration with Other Systems
Integration with ERP, supply‑chain, and upstream/downstream systems.
Correlation with industry data and national economic indicators.
Connection to email, notification, and productivity tools.
Conclusion
The article presents a progressive roadmap—from scripts to tools, services, platforms, and finally products—for building a big data platform that aligns with business needs. It contrasts business‑scenario and generic‑component construction methods, recommends a phased adoption, and demonstrates how the Lambda architecture enables data collection, processing, and visualization to drive business operations and create commercial value.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.