Operations 9 min read

Building a Private Cloud Elasticsearch Platform with Mesos and Docker

This article describes how the OPS team designed and implemented a private‑cloud Elasticsearch service using Mesos for resource management, Docker containers orchestrated by Marathon, and a suite of monitoring, self‑service configuration, and continuous deployment tools to improve resource utilization and operational efficiency.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Building a Private Cloud Elasticsearch Platform with Mesos and Docker

The presentation introduces the OPS team's private‑cloud Elasticsearch solution built on the Mesos resource management platform and Docker container technology.

It is organized into four parts: background and current status, technical implementation, configuration and deployment, and monitoring & alerting.

1 Background and Status

At the end of 2015 and early 2016, the company's demand for Elasticsearch surged, exposing several drawbacks of the traditional usage model. The team defined design goals to address these issues and built the platform accordingly.

Since its launch in March‑April 2016, the platform has significantly improved work efficiency in three areas, as shown by resource‑utilization statistics and the current scale of the platform.

2 Technical Implementation

The team investigated three reference systems: Elastic Cloud (official Elastic public‑cloud service), Amazon Elasticsearch Service, and an open‑source Mesos‑based scheduling framework. Based on their limitations, a custom solution was designed.

The platform runs on Mesos, with all components packaged as Docker containers and scheduled by Marathon. The architecture includes a Root Marathon that schedules Sub Marathons, each Sub Marathon representing a business line and hosting multiple Elasticsearch SaaS services.

Resource allocation follows a hierarchical Marathon model: Root Marathon owns all resources, while Sub Marathon receives a fixed quota and maps one‑to‑one with a business line.

Each Sub Marathon can host multiple Elasticsearch clusters, each consisting of four core components (bamboo, es‑master, es‑datanode, es2graphite) deployed as Marathon apps. Service discovery is handled by bamboo + HAProxy, and metrics are collected by pyadvisor and sent to Graphite.

3 Configuration and Deployment

All Elasticsearch configurations are stored in GitLab, including a customizable pre‑run script executed before container startup. Changes take effect after a container restart.

A self‑service web UI provides detailed cluster information and allows users to perform configuration and plugin management.

Continuous deployment is driven by Jenkins in three steps: configuration initialization (generating files stored in GitLab), cluster deployment (submitting components to Marathon), and final Marathon scheduling to bring the Elasticsearch cluster online.

4 Monitoring and Alerting

Monitoring collects metrics via two methods and aggregates them for visualization.

Alerting covers several aspects, illustrated in the following diagrams.

Overall, the solution manages the full lifecycle of Elasticsearch clusters—from capacity planning and configuration, through automated deployment, to self‑service management, comprehensive monitoring, alerting, and resource reclamation upon decommissioning.

DockerElasticsearchprivate cloudMesosMarathon
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.