Cloud Computing 26 min read

Multi‑Cloud Management Platform ARES: Architecture, Features and Practices

ARES, Bilibili’s multi‑cloud management platform, unifies resource provisioning, asset inventory, user access, and cost optimization across public clouds through a layered architecture, project‑centric tagging, Terraform‑based orchestration, and centralized security, while addressing manual provisioning, fragmented permissions, and visibility challenges, and plans to extend into hybrid‑cloud automation.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Multi‑Cloud Management Platform ARES: Architecture, Features and Practices

This document introduces the ARES multi‑cloud management platform built by Bilibili’s System Department. It explains why multi‑cloud is needed, the problems it solves, and the platform’s overall design and capabilities.

Why use multi‑cloud?

Public clouds offer elasticity, pay‑as‑you‑go pricing and global coverage, which makes them attractive for fast‑growing enterprises.

Adopting multiple cloud providers improves stability, reduces vendor lock‑in and enhances bargaining power.

Different providers have varying regions and product capabilities; multi‑cloud lets a company leverage the best of each.

Problems caused by multi‑cloud

Resource provisioning is manual and slow, especially for large‑scale batch deployments.

Resources are scattered across many accounts and clouds, making inventory and ownership unclear.

Permission management is fragmented; users must log into each cloud console with separate credentials.

Network configuration is complex and requires deep expertise.

Rising cloud costs lack visibility and optimization tools.

Platform Overview

ARES addresses these issues by providing a unified platform for resource management, orchestration, user management, cost control and security. The platform has been iteratively developed since April 2022, with the first version released in July 2022.

2.1 Platform Architecture

The architecture follows a layered model:

Top layer – a unified front‑end portal that offers a single entry point for all resource lifecycle operations.

Business logic layer – handles project management, asset management, user management, resource orchestration and cost management. It also aggregates multi‑cloud billing and account information.

Engine layer – integrates with cloud providers via IaC (Terraform) and provider APIs to execute actions.

3. Platform Functions

3.1 Project‑Centric Global Management

Projects are the core entity. Each project is linked to a single organization unit, can own multiple cloud accounts, and maps to a cloud‑specific project (e.g., Alibaba Cloud Resource Group, AWS Project). Resources are tagged with bili_project to bridge gaps where cloud projects cannot cover all assets.

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "*:Describe*",
        "*:List*",
        "*:Get*",
        "*:BatchGet*",
        "*:Query*",
        "*:BatchQuery*",
        "actiontrail:LookupEvents",
        "actiontrail:Check*",
        "dm:Desc*",
        "dm:SenderStatistics*",
        "ram:GenerateCredentialReport",
        "cloudsso:Check*",
        "notifications:Read*"
      ],
      "Resource": "*",
      "Condition": {
        "StringEqualsIgnoreCase": {
          "*:tag/bili_project": ["项目A"]
        }
      }
    }
  ]
}

Roles defined per project: R&D leader, R&D member, Ops leader, Ops member. Permissions are enforced through custom cloud policies that reference the bili_project tag.

3.2 Unified Asset Management

ARES standardizes product names across clouds (e.g., “cloud server” maps to ECS, CVM, EC2, etc.) and aligns attributes such as ID, name, region, zone, image, and specification. Tables illustrate the mapping for cloud servers and attribute standardization.

3.3 IaC‑Based Resource Orchestration

Terraform is the primary IaC tool, supplemented by direct API calls. ARES abstracts provider‑specific fields into unified variables, allowing users to author a single set of inputs regardless of the underlying cloud.

# Create backend server
resource "Acloud_instance" "server_attachment" {
  count                     = 1
  image_id                  = "ubuntu_18_04_64_20G_alibase_20190624.vhd"
  instance_type             = "ecs.n4.large"
  instance_name             = "test"
  security_groups           = "sg-mj7itgyeohjw2ebvyrl"
  internet_charge_type     = "PayByTraffic"
  internet_max_bandwidth_out = "10"
  availability_zone         = "cn-hangzhou-A"
  instance_charge_type      = "PostPaid"
  system_disk_category      = "cloud_efficiency"
  vswitch_id                = "vsw-uf66iu2bxce23ityocdx"
}

Resource templates are generated automatically: a variable.tf file declares all unified variables, and a main.tf file instantiates the provider‑specific resources. Multi‑resource scenarios (e.g., attaching a load‑balancer to backend servers) are expressed by referencing IDs across resources.

resource "Acloud_slb_server_group_server_attachment" "server_attachment" {
  ...
  server_group_id = Acloud_slb_server_group.server_attachment.id
  server_id      = Acloud_instance.server_attachment[count.index].id
  ...
}

3.4 Secure User Management

Cloud accounts are bound to internal identity (SSO). Account lifecycle includes application, SSO‑based login (via internal IDP), and automated reclamation for departed employees. The platform disables console access immediately after a user leaves and later deletes the account.

3.5 Multi‑Dimensional Cost Management

Cost management spans the entire resource lifecycle: demand assessment, vendor selection, resource provisioning, runtime monitoring, and bill analysis. ARES provides:

Cost visualization by time, project and organization.

Cost composition breakdown.

Usage confirmation using standardized metrics (e.g., CPU cores for servers).

Runtime cost optimization includes low‑utilization down‑scaling and idle‑resource reclamation (unused disks, floating IPs, idle load‑balancers, abnormal servers).

4. Outlook

Future directions include deeper cost‑optimization features, container‑native and cloud‑native capabilities for automatic migration and scaling, and extending the platform to manage private clouds, forming a unified hybrid‑cloud management solution.

cloud-nativeMulti-CloudCost ManagementplatformIaCResource OrchestrationTerraform
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.