Multi‑Cloud Management Platform ARES: Architecture, Features and Practices
ARES, Bilibili’s multi‑cloud management platform, unifies resource provisioning, asset inventory, user access, and cost optimization across public clouds through a layered architecture, project‑centric tagging, Terraform‑based orchestration, and centralized security, while addressing manual provisioning, fragmented permissions, and visibility challenges, and plans to extend into hybrid‑cloud automation.
This document introduces the ARES multi‑cloud management platform built by Bilibili’s System Department. It explains why multi‑cloud is needed, the problems it solves, and the platform’s overall design and capabilities.
Why use multi‑cloud?
Public clouds offer elasticity, pay‑as‑you‑go pricing and global coverage, which makes them attractive for fast‑growing enterprises.
Adopting multiple cloud providers improves stability, reduces vendor lock‑in and enhances bargaining power.
Different providers have varying regions and product capabilities; multi‑cloud lets a company leverage the best of each.
Problems caused by multi‑cloud
Resource provisioning is manual and slow, especially for large‑scale batch deployments.
Resources are scattered across many accounts and clouds, making inventory and ownership unclear.
Permission management is fragmented; users must log into each cloud console with separate credentials.
Network configuration is complex and requires deep expertise.
Rising cloud costs lack visibility and optimization tools.
Platform Overview
ARES addresses these issues by providing a unified platform for resource management, orchestration, user management, cost control and security. The platform has been iteratively developed since April 2022, with the first version released in July 2022.
2.1 Platform Architecture
The architecture follows a layered model:
Top layer – a unified front‑end portal that offers a single entry point for all resource lifecycle operations.
Business logic layer – handles project management, asset management, user management, resource orchestration and cost management. It also aggregates multi‑cloud billing and account information.
Engine layer – integrates with cloud providers via IaC (Terraform) and provider APIs to execute actions.
3. Platform Functions
3.1 Project‑Centric Global Management
Projects are the core entity. Each project is linked to a single organization unit, can own multiple cloud accounts, and maps to a cloud‑specific project (e.g., Alibaba Cloud Resource Group, AWS Project). Resources are tagged with bili_project to bridge gaps where cloud projects cannot cover all assets.
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"*:Describe*",
"*:List*",
"*:Get*",
"*:BatchGet*",
"*:Query*",
"*:BatchQuery*",
"actiontrail:LookupEvents",
"actiontrail:Check*",
"dm:Desc*",
"dm:SenderStatistics*",
"ram:GenerateCredentialReport",
"cloudsso:Check*",
"notifications:Read*"
],
"Resource": "*",
"Condition": {
"StringEqualsIgnoreCase": {
"*:tag/bili_project": ["项目A"]
}
}
}
]
}Roles defined per project: R&D leader, R&D member, Ops leader, Ops member. Permissions are enforced through custom cloud policies that reference the bili_project tag.
3.2 Unified Asset Management
ARES standardizes product names across clouds (e.g., “cloud server” maps to ECS, CVM, EC2, etc.) and aligns attributes such as ID, name, region, zone, image, and specification. Tables illustrate the mapping for cloud servers and attribute standardization.
3.3 IaC‑Based Resource Orchestration
Terraform is the primary IaC tool, supplemented by direct API calls. ARES abstracts provider‑specific fields into unified variables, allowing users to author a single set of inputs regardless of the underlying cloud.
# Create backend server
resource "Acloud_instance" "server_attachment" {
count = 1
image_id = "ubuntu_18_04_64_20G_alibase_20190624.vhd"
instance_type = "ecs.n4.large"
instance_name = "test"
security_groups = "sg-mj7itgyeohjw2ebvyrl"
internet_charge_type = "PayByTraffic"
internet_max_bandwidth_out = "10"
availability_zone = "cn-hangzhou-A"
instance_charge_type = "PostPaid"
system_disk_category = "cloud_efficiency"
vswitch_id = "vsw-uf66iu2bxce23ityocdx"
}Resource templates are generated automatically: a variable.tf file declares all unified variables, and a main.tf file instantiates the provider‑specific resources. Multi‑resource scenarios (e.g., attaching a load‑balancer to backend servers) are expressed by referencing IDs across resources.
resource "Acloud_slb_server_group_server_attachment" "server_attachment" {
...
server_group_id = Acloud_slb_server_group.server_attachment.id
server_id = Acloud_instance.server_attachment[count.index].id
...
}3.4 Secure User Management
Cloud accounts are bound to internal identity (SSO). Account lifecycle includes application, SSO‑based login (via internal IDP), and automated reclamation for departed employees. The platform disables console access immediately after a user leaves and later deletes the account.
3.5 Multi‑Dimensional Cost Management
Cost management spans the entire resource lifecycle: demand assessment, vendor selection, resource provisioning, runtime monitoring, and bill analysis. ARES provides:
Cost visualization by time, project and organization.
Cost composition breakdown.
Usage confirmation using standardized metrics (e.g., CPU cores for servers).
Runtime cost optimization includes low‑utilization down‑scaling and idle‑resource reclamation (unused disks, floating IPs, idle load‑balancers, abnormal servers).
4. Outlook
Future directions include deeper cost‑optimization features, container‑native and cloud‑native capabilities for automatic migration and scaling, and extending the platform to manage private clouds, forming a unified hybrid‑cloud management solution.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.