AutoStream Real‑Time Computing Platform: Architecture, Resource Management, Scaling, Lakehouse Integration, and PyFlink Practices
This article details Car Home's AutoStream platform evolution from Storm to Flink‑based versions, covering real‑time application scenarios, strict budget‑controlled resource management, automatic scaling, lake‑house architecture with Iceberg, PyFlink integration, and future plans for resource optimisation and batch‑stream unification.
AutoStream is Car Home's real‑time computing platform, evolved from Storm to Flink‑based AutoStream 1.0‑3.0, supporting SQL + UDF development and PyFlink.
Key application scenarios include real‑time metric dashboards, monitoring & alerts, data processing, user behavior profiling, real‑time data lake ingestion, and data transmission.
To curb resource waste, a strict budget control mechanism ties task execution to team budgets, enforces low‑utilisation thresholds, and provides automated rescaling based on CPU, memory and slot usage.
Automatic scaling (RescaleCoordinator) runs on the JobManager, periodically checks configured policies, and triggers container replacement without service disruption, notifying owners via DingTalk and email.
The platform adopts a lake‑house architecture built on Apache Iceberg, replacing Hive‑based warehouses to achieve near‑real‑time data freshness, upsert support, and schema evolution.
PyFlink integration enables Python developers to write Flink jobs via SQL + UDF, with dependency management through the platform’s file service and seamless catalog sharing.
Future work focuses on further resource optimisation, mixed real‑time/offline workloads, finer‑grained YARN scheduling, and exploring Flink’s batch‑stream unification proposals such as FLIP‑188.
HomeTech
HomeTech tech sharing
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.