Operations 21 min read

Rethinking Operations: The “Third Kind” of SRE at Lianjia

The article shares the author’s experience transitioning from private to public and hybrid clouds at Lianjia, introduces a “third kind” of operations that blends traditional and internet‑based practices, and discusses containers, DNS‑based naming, and automation tools to build adaptable, cost‑effective infrastructure.

Efficient Ops
Efficient Ops
Efficient Ops
Rethinking Operations: The “Third Kind” of SRE at Lianjia

1. Introduction

The author, formerly a technical director at Sina and Huawei, joined Lianjia as SRE lead, bringing experience from private, public, and hybrid cloud environments to propose new operational ideas.

2. Lianjia’s “Third Kind” of Operations

Lianjia describes a “third kind” of intermediary that combines offline brokerage with online product features, positioning its operations between traditional IT and fully internet‑based models. This approach seeks a balanced, gray‑area operational state rather than an extreme.

3. New Technologies

3.1 Misconceptions About New Tech

Many assume containers or OpenStack are the only solutions for cloud computing, but the author argues that such views are overly simplistic.

3.2 Understanding Containers

Using the analogy of shipping containers, the author explains that containers provide a uniform size for transporting diverse workloads, allowing teams to manage heterogeneous services without handling each application’s internal details.

3.3 When to Use Containers

In large organizations with hundreds of teams using varied languages and protocols, containers help standardize deployment. However, if the environment is homogeneous (e.g., all PHP on Apache), containers may add unnecessary cost.

3.4 Container‑based Tomcat

For small teams, using Tomcat as a container can be simpler than Docker, leveraging CGroup for resource isolation while keeping familiar tooling like Ansible, Puppet, or SaltStack.

4. Choosing the Best Solution

The author emphasizes evaluating the trade‑offs of each technology for the specific team and workload, acknowledging that future advances may shift the optimal choice.

4.1 The “RASH” Solution

To mitigate RPC‑induced blocking, the team built a library preloaded via

LD_PRELOAD

that intercepts network calls, routes them through a Socks4 proxy, and quickly fails slow responses, preventing cascading delays.

4.2 RASH Algorithm

The algorithm tracks average response times and limits concurrent connections based on a configurable timeout, dynamically adjusting queue depth to avoid overload.

5. Alternative Naming Services

5.1 Naming Service Overview

Traditional DNS acts as a basic naming service; the team also uses etcd and SkyDNS for service discovery.

5.2 DNS‑Based Naming

By mapping services to domain names, code changes are minimized. To handle DNS caching issues, a DNSMasq layer is added, allowing remote cache invalidation and fast updates.

5.3 Handling Ports

Since DNS cannot directly encode ports, the team adopts SRV records (where supported) or encodes the port in the first sub‑domain (e.g.,

3306.mysql.lianjia.com

).

6. Configuration Management and Automation

The author, a translator of “Running Ansible,” compares Puppet, Ansible, and SaltStack, favoring Ansible’s thin abstraction for large‑scale Linux fleets, while noting its memory usage and static configuration limitations.

7. Conclusion

Real‑world operations exist on a spectrum between idealized extremes; teams must assess new technologies for cost, benefit, and fit, and ensure solutions are truly runnable in production.

automationSREInfrastructurehybrid-cloudContainersNaming Service
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.