Big Data 15 min read

Multi‑Tenant Architecture in Public‑Cloud Big Data Platforms: Design, Challenges, and MaxCompute Implementation

This article explains the different multi‑tenant models used by public‑cloud big data platforms, analyzes their advantages and challenges, and details how Alibaba Cloud's MaxCompute realizes strong multi‑tenancy through storage design, resource scheduling, security containers, virtual networking, and future evolution directions.

DataFunTalk
DataFunTalk
DataFunTalk
Multi‑Tenant Architecture in Public‑Cloud Big Data Platforms: Design, Challenges, and MaxCompute Implementation

Speaker & Organizer Guest: Dong Guoping, Senior Technical Expert at Alibaba Cloud Editor: Liyao (DataFunTalk)

Introduction Public‑cloud big data platforms adopt different multi‑tenant designs. This article outlines the key technical points and challenges of multi‑tenant implementations, focusing on MaxCompute’s characteristics.

Outline

Big‑Data Platform Multi‑Tenant Forms

Advantages & Challenges of Strong Multi‑Tenant

MaxCompute Multi‑Tenant Implementation

Why & Future Evolution

01 – Big‑Data Platform Multi‑Tenant Forms

Three typical tenant models are described:

1) Each tenant owns an exclusive database instance (traditional cloud DB with role‑based access).

2) Multi‑tenant control plane (metadata & permissions are shared, compute resources are isolated).

3) Full‑scale strong multi‑tenant (share everything: control, compute, storage).

Higher tenant isolation improves scalability but raises system complexity, stability, and security concerns.

Focus of the talk: compute and storage multi‑tenant implementation.

Typical Model – Single‑Tenant Compute + Open Storage Examples: AWS EMR, Databricks. Control layer is multi‑tenant, compute is per‑tenant, storage is shared (e.g., S3). Advantages: supports complex UDFs, easy cross‑cloud migration. Challenges: tenant‑level resource granularity, storage read/write latency, need for intermediate data handling.

BigQuery vs. MaxCompute Both use multi‑tenant compute + internal storage. Advantages: extreme elasticity, high‑bandwidth internal storage. Challenges: UDF support (security containers vs. limited JS UDF), cloud‑host limitations requiring bare‑metal or physical machines.

02 – Advantages & Challenges of Strong Multi‑Tenant

Benefits: zero‑setup resource pools, seconds‑level scaling, pay‑per‑use billing, higher utilization through workload sharing, cost efficiency.

Challenges: storage openness, fair scheduling across tenants, runtime isolation for UDFs or third‑party engines, network isolation for tenant‑specific requirements.

03 – MaxCompute Multi‑Tenant Implementation

MaxCompute is Alibaba Cloud’s serverless, enterprise‑grade data warehouse offering strong multi‑tenant capabilities.

Storage Uses proprietary Feitian storage engine (Pangu) with capability‑based permission model, distributed access, and internal temporary data handling.

Resource Scheduling A scalable scheduler guarantees fair allocation for diverse tenant workloads, supports both prepaid and postpaid resource models, and provides fail‑over handling.

Host‑Level Isolation cgroup‑based job isolation, supporting process or container execution, with CPU/GPU resource control.

UDFs are fully supported (SQL, Java, Python) and run inside lightweight security containers that provide process‑level isolation, trimmed kernels, restricted network access, and fast startup.

Virtual networking (VXLAN) isolates task‑level nodes; task‑level network tunneling enables controlled external access.

Performance considerations: frequent task start/stop creates many nodes, demanding efficient VPC/VXLAN handling.

Result: a single strong multi‑tenant cluster can support diverse workloads (SQL, UDF, PAI ML, Spark).

04 – Why & Future Evolution

Motivation: MaxCompute serves >90% of Alibaba’s offline data, requiring strong security and cost‑effective elasticity.

Future directions:

Open storage – increase openness while retaining internal performance.

Single‑tenant compute – provide dedicated compute for large customers to avoid resource contention.

Multi‑cloud – extend open storage and single‑tenant compute to support multi‑cloud deployments.

Thank you for listening.

Big DataCloud ComputingMaxComputeResource Schedulingdata securityMulti-tenancy
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.