How to Build and Operate a National-Scale Private Cloud: Lessons and Trends
This talk outlines why organizations pursue cloud adoption, defines cloud‑native goals, reviews emerging trends such as bare‑metal and hyper‑convergence, and shares practical private‑cloud operation experiences, including ITIL processes, project management, and tooling, offering a comprehensive view of national‑level private‑cloud practice.
Good afternoon! I will share the topic of national‑level private cloud practice, divided into three parts:
Why build a cloud and what are its goals?
Recent cloud technology trends and recent changes.
My own private‑cloud operations practice.
1. Cloud Construction Goals
Why build a cloud? I propose a "rocket model" where the goal is to achieve cloud‑native characteristics: elasticity, continuous delivery, multi‑tenancy, infrastructure‑agnostic, statelessness, redundancy, automation, modularity, and micro‑services. To realize cloud‑native, the underlying infrastructure must also become cloud‑native, exposing APIs for VMs, bare metal, and containers.
Implementation paths include:
Serverless computing on public clouds – low learning cost but risk of vendor lock‑in.
Public‑cloud container services – more mature but still incurs cost and potential lock‑in.
Building a private cloud and then constructing a cloud‑native environment – long cycle and high requirements.
Existing workloads need refactoring for cloud‑native; a "dual‑mode IT" approach can keep legacy systems while building new cloud‑native services.
Currently, true cloud‑native adoption is limited to leading enterprises (e.g., BAT, Google, Facebook). Many organizations only use VMs or virtualization, lacking full cloud‑native capabilities due to business, talent, and management challenges.
2. Recent Cloud Technology Trends
• Bare‑metal cloud: Physical servers offered as cloud services (e.g., China Mobile, Tencent "Black Stone", Oracle, IBM, Huawei, Alibaba, Microsoft). Market projected to reach $9 billion by 2020. Benefits include strong compute power, physical isolation, fast delivery, and API‑driven provisioning, but challenges remain such as remote desktop access.
Typical bare‑metal scenarios: high‑performance gaming, genome sequencing, core databases (SAP HANA), big‑data analytics, and private container clouds.
• Hyper‑convergence: Integrated compute‑storage‑network solutions with rapid market growth (≈20% annual in China). Advantages are turnkey deployment; disadvantages are higher cost and limited suitability for large‑scale enterprises.
• SDN and intelligent networking: Separation of control and data planes, combined with AI (SDN+AI) for intent‑based networking, automated policy verification, execution, and monitoring.
• Multi‑cloud and hybrid cloud: Using multiple public clouds (Alibaba, Tencent, etc.) together with private clouds, often with heterogeneous hardware (x86, mini‑servers).
Private‑cloud platforms in China mainly use OpenStack and VMware; recent discussions highlight the “eight‑year itch” of OpenStack.
• Cloud‑network integration: Connecting private and public clouds to ensure network stability and data security.
3. Private‑Cloud Operations Experience
Large‑scale private clouds still rely on traditional frameworks such as PDCA and ITIL. ITIL processes are essential for compliance (e.g., three‑level security protection) and internal communication.
Project implementation follows a structured lifecycle: compliance checks, ITIL‑based procedures, and resource planning.
Organizational evolution:
Initial structure: System deployment, cloud platform implementation, network implementation, support (24‑hour helpdesk, ticketing, task tracking), and physical server rack‑up.
Later structure: Architecture team (design, solution validation, troubleshooting), Implementation team (cloud building, OS ops, network/security devices, distributed storage, cloud platform), and Support team (similar to before).
Each project forms a virtual team with a dedicated owner responsible from start to finish, handling resource requests, client communication, and issue resolution.
Project planning uses WBS (Work Breakdown Structure) to organize tasks. Example analogy: making dumplings – break down into filling preparation, dough, etc.
Tools: Open‑source ticketing system OTRS, custom dashboards for high‑visibility monitoring, and expert knowledge bases for complex issues.
In summary, private‑cloud operations demand strict adherence to processes, continuous optimization, and reliance on experience, platforms, checklists, and verification mechanisms.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.