Cloud Computing 30 min read

How LeTV E‑Commerce Cloud Scales High‑Traffic Shopping with Microservices

This article, based on the Efficient Operations Community Talk, outlines the evolution of e‑commerce systems, the challenges faced during rapid growth, and how LeTV’s e‑commerce cloud leverages micro‑service architecture, container technology, and hybrid cloud solutions to address scalability, security, and operational efficiency.

Efficient Ops
Efficient Ops
Efficient Ops
How LeTV E‑Commerce Cloud Scales High‑Traffic Shopping with Microservices

1. E‑commerce System Development Process

E‑commerce websites have varying architectural complexity across different stages:

Startup Phase : Few product types, low business complexity, simple system architecture. Basic components such as high‑availability databases, distributed cache, and file storage suffice.

Growth Phase : Data volume, business complexity, system complexity, and compute resource demand surge. Business needs to be split and deployed independently, using CDN, high‑availability databases, distributed cache, message queues, and distributed file storage.

The basic e‑commerce technical architecture diagram is shown below:

2. Problems in Rapid Growth

2.1 Fast Business Expansion and Low Resource Utilization

Enterprises aim to seize market opportunities and reduce operating costs. Traditional server or data‑center rentals incur high hardware and personnel expenses, and procurement‑to‑deployment cycles are long and inefficient.

2.2 Escalating System Complexity and Tight Coupling

Rapid business scaling often leaves little time for refactoring, leading to bloated, tightly coupled systems where changes in one component affect many others. Duplicate code across projects further reduces reuse and complicates debugging.

2.3 Growing Security Risks

Large‑scale e‑commerce faces DDoS attacks, malicious orders, penetration attempts, and privacy breaches. New cybersecurity regulations impose strict standards, yet many small or traditional e‑commerce firms lack the technical capacity for robust protection.

These issues can be mitigated by adopting e‑commerce cloud solutions: micro‑service architecture reduces system complexity, while cloud services address resource demand and security.

3. Evolution Toward Micro‑services

As site scale, functionality, and data processing types increase, micro‑service design has become an industry consensus.

Adopt a gradual, incremental approach to service splitting rather than wholesale rewrites.

Splitting should be measured; focus on business domains, extract common components, and apply layered abstraction.

3.1 Advantages of Micro‑services

Services become high‑cohesion, low‑coupling modules, enabling independent development and maintenance by dedicated teams.

Distributed deployment allows elastic scaling of bottleneck services, improving concurrency and control granularity.

3.2 Disadvantages of Micro‑services

Increased number of services raises deployment, management, and monitoring workload, demanding higher operational expertise.

More inter‑service communication adds complexity to dependency management.

Fault diagnosis becomes harder as requests traverse more processing nodes.

3.3 E‑commerce Micro‑service Architecture

The e‑commerce micro‑service architecture is illustrated below:

The system can be grouped into three layers:

Business Layer : Product info, cart, order, payment, flash‑sale, inventory, logistics, reviews, customer service, recommendation, etc.

Common Service Layer : Provides shared services for the business layer, encapsulating common logic.

Infrastructure Layer : High‑availability DB, distributed cache, message queues, NoSQL, etc., often offered as PaaS.

Micro‑service benefits and drawbacks can be addressed through e‑commerce cloud and container technologies.

4. LeTV E‑commerce Cloud

Micro‑service concepts pre‑date mature cloud, but cloud maturity provides a solid carrier. Vertical‑industry clouds have emerged, offering domain‑specific services with lower entry barriers. LeTV e‑commerce cloud delivers a complete solution for leveraging cloud advantages in the e‑commerce sector.

4.1 Vertical Cloud Requirements for E‑commerce

High performance

High availability

Good scalability

Easy extensibility

Security guarantees

4.2 Product Forms

4.2.1 Public Cloud

Ideal for early‑stage customers needing rapid system construction. A single account provides a full e‑commerce service suite, abstracting complex technical and hardware integration, with optional managed operations.

DNS resolves domain names, first‑level load balancing via DNS, static resources served by nearby CDN nodes, and cloud firewall filters malicious traffic.

Micro‑service design is recommended; PaaS components such as RDS, message queues, OSS, and distributed cache support the third‑layer infrastructure.

Containers host micro‑services on IaaS; Service Router or API Gateway directs requests to appropriate service clusters.

4.2.2 Private Cloud

For growing enterprises with specialized needs, private cloud offers customized solutions, including data‑center construction, service deployment, and full‑stack packaging, keeping data on‑premises for security.

4.2.3 Hybrid Cloud

Combines public and private cloud strengths: public cloud handles user‑facing traffic and high‑concurrency scenarios, while private cloud retains core data (finance, user info) for security and compliance.

Hybrid deployment ensures resource elasticity, pay‑as‑you‑go cost efficiency, and seamless handling of flash‑sale spikes.

Global load balancing directs traffic to public or private clouds based on service type.

4.3 Platform Architecture

Architecture diagram:

4.3.1 Log Collection

Self‑service log services provide APIs for data retrieval or visualization, enabling data‑driven business improvements.

4.3.2 Monitoring

Metrics from logs, interfaces, and services trigger alerts via phone, SMS, or WeChat, with flexible rule configuration.

4.3.3 Scaling

Examples for shopping festivals: pre‑scale resources, conduct stress tests, ensure redundancy for critical data, and de‑provision after the event to save costs.

4.3.4 Traffic Scheduling

Custom traffic scheduling across DNS, load balancer, and Service Router layers, using algorithms and resource pools to handle massive concurrent requests during flash sales.

4.3.5 Billing

Pay‑as‑you‑go for PaaS components, tiered pricing, and flexible service packages.

4.4 Hybrid Cloud and Data Flow Diagram

4.5 Flash‑sale Case Study

Flash‑sale demands instantaneous high concurrency.

4.5.1 Guiding Principles

Reduce traffic reaching the backend.

Minimize direct database operations.

Ensure resource redundancy.

Apply least‑privilege principle across all layers.

4.5.2 Scenarios and Solutions

Scenario 1 : Frequent page refreshes cause bandwidth overload.

Solution: Static page generation, compression, HTTP header optimization, CDN acceleration, and caching of non‑critical metadata.

Scenario 2 : CDN offloads static assets but still forwards many valid or malicious requests to the backend.

Solution: LeTV Cloud Shield filters malicious traffic, WAF blocks SQL injection, XSS, and other attacks; request pre‑processing rejects unnecessary requests early.

Scenario 3 : Database overload leads to performance degradation or crashes.

Solution: Efficient caching strategy; hot data cached in OCS, globally consistent data stored in LeTV RDS.

Scenario 4 : Data inconsistency causes overselling.

Solution: Use message queues to buffer requests, set thresholds, employ distributed queues for inventory synchronization, and transactional locking for stock decrement and order updates.

5. Container Technology

Containers accelerate micro‑service adoption and cloud evolution.

5.1 Advantages

Higher resource utilization.

Fast start‑stop times.

Facilitates horizontal scaling.

5.2 Disadvantages

Isolation challenges.

Network support limitations.

LeTV Cloud’s container‑based support includes Matrix and BeeHive systems.

6. BeeHive & Matrix

Matrix manages user‑resource relationships, global scheduling, self‑service component provisioning, monitoring, billing, and orchestration.

BeeHive schedules container services on top of IaaS, handling container creation, cluster management, and resource allocation.

6.1 Deployment Diagram

6.2 BeeHive Scheduling System

6.2.1 Core Design Principles

Openness

Abstraction

Inclusiveness

6.2.2 Internal Structure

BeeHive comprises scheduling, compute, network, and storage abstraction layers, providing unified interfaces for resource provisioning and supporting plug‑in architectures.

6.2.3 Network Architecture

vRouter implements SDN‑based routing; control layer is decentralized and highly available, while the forwarding layer uses kernel routing without NAT overhead. MacVLAN and NAT modes are also supported.

7. Globalization

2016 marked LeTV Group’s global expansion, and e‑commerce cloud inherits this vision. Cross‑border e‑commerce requires proximity to users for better experience.

7.1 Use Cases

Customers need to reach Southeast Asia, North America, India, etc., deploying services close to end‑users.

7.2 Measures

LeTV infrastructure spans dozens of countries, providing resource pools for cross‑border deployment.

Global message queues synchronize data across regions; BeeHive schedules without a central node.

Multiple independent e‑commerce cloud deployments per continent ensure stability and performance.

Regional and global image repositories support worldwide distribution.

8. Outlook

8.1 DCOS (Data Center Operation System)

Core data‑center applications include parallel computing and micro‑services. Efficient resource management and reuse are critical. DCOS abstracts heterogeneous resources as a single compute pool, enabling higher utilization.

LeTV Cloud explores DCOS to improve resource efficiency, reduce costs, and promote environmentally friendly operations.

BeeHive evolves into Le Distribution Kernel (LDK) atop IaaS, providing a pluggable service framework with APIs for compute, storage, and network resources.

8.2 Service Governance Framework

The built‑in framework offers language‑agnostic service registration, discovery, routing, authorization, monitoring, SLA enforcement, and elastic scaling.

9. Q&A

Q1: How to ensure data consistency across multiple data centers during high‑concurrency purchases?

A: Data is partitioned and stored in region‑specific relational stores (global RDS). Global uniqueness with concurrent access is not fully supported yet; synchronization occurs via a central node.

Q2: Does BeeHive optimize communication between Docker instances on the same host?

A: High‑availability deployments avoid placing the same service on a single host. Inter‑service traffic uses vRouter with kernel routing, bypassing switches.

Q3: Why build BeeHive DCOS instead of using Mesosphere’s DCOS?

A: BeeHive is built on LeTV’s own stack, deeply integrated with LeTV Cloud, offers abstract proxy layers, and avoids reliance on a commercial, non‑open‑source solution.

Q4: Why did micro‑services gain traction only after cloud maturity?

A: Early micro‑service concepts suffered from deployment and operational complexity; container technology later resolved these pain points, enabling practical adoption.

Q5: How does big‑data engine deployment relate to micro‑service operations?

A: Idle micro‑service resources are repurposed for batch jobs (MR, RDD). Conversely, during high‑traffic periods, resources are reserved for micro‑services.

Q6: How are high‑IOPS, CPU, and PPS workloads handled in the cloud?

A: Resource pools allocate PCIe SSDs for IOPS, multi‑core machines for CPU‑intensive tasks, and custom attributes guide intelligent scheduling, with future machine‑learning‑driven decisions.

Q7: If a user first accesses from Los Angeles and later from China, will their data remain in LA?

A: Currently, data stays in the original region, causing latency for distant access. Future plans include global asynchronous synchronization, though true global concurrency remains a challenge.

Q8: What issues arise from keeping user data in the original region?

A: Increased latency and slower data loading; global sync would mitigate but introduces concurrency complexities.

Q9: Does BeeHive support per‑Docker bandwidth limits?

A: Not yet; plans involve using traffic control (TC) similar to OpenStack, with Service Router handling higher‑level traffic distribution.

Q10: How is hot data identified and stored in OCS?

A: Currently driven by business logic; automatic hot‑data migration from RDS to OCS is not yet available.

Q11: What solutions handle large and small file storage?

A: Large files use Ceph; small files build on Ceph with added caching mechanisms.

e-commercecloud computingmicroservicesoperationscontainer
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.