Operations 19 min read

Measuring Operations Automation Rate and Building a Self‑Coding Automation Platform

This article explains the challenges of manual operations, defines an automation‑rate metric, introduces the Tai‑Shan Kirin platform for self‑coded operational automation, provides step‑by‑step implementation guidance with code examples, and shares a case study demonstrating significant efficiency and stability gains.

JD Retail Technology
JD Retail Technology
JD Retail Technology
Measuring Operations Automation Rate and Building a Self‑Coding Automation Platform

The article introduces the need to measure operations automation and presents a platform that supports self‑coding for automation, aiming to move operations into a new era of efficiency and reliability.

1. Introduction – As system and middleware complexity grows, traditional manual operations become inefficient and risky. Automation is essential, and enabling engineers to code their own automation tasks can improve efficiency and reduce risk.

2. Challenges of Manual Operations – Complex script management, manual errors, reliance on individual expertise, lack of standardized processes, limited personal growth, high cost, low efficiency, and stability concerns are highlighted.

3. Importance of Automation – Automation reduces costs, saves time, and improves system stability by minimizing human error.

4. Definition of Operations Automation Rate – The metric is calculated as: Automation Rate = Automation Operations (via Tai‑Shan Kirin) / (Manual Operations (via bastion host) + Automation Operations) . The rate rose from 3% in Q2 to 63% after implementation.

5. Why Engineers Should Code Their Own Automation – Benefits include lower communication cost, faster response to needs, reduced maintenance cost, and professional growth for engineers.

6. Tai‑Shan Kirin Platform Overview – The platform extends Kubernetes with Custom Resource Definitions (CRDs) to provide a programmable, unified operations platform, handling common infrastructure tasks while engineers focus on business logic.

7. Implementation Steps

1) Apply for an operations‑system menu (admin creates a menu and provides an auth file). 2) Create an operations function (choose HTTP service or custom CRD‑based controller). 3) Write the function code using the provided templates. 4) Deploy the code (e.g., in a container). 5) Publish the function on the platform. 6) Execute the function directly or via orchestration. 7) View execution records, parameters, logs, and results.

Code Example – kubeconfig (YAML)

apiVersion: v1
clusters:
- cluster:
    certificate-authority: ca.pem
    server: https://xxx.jd.com:80
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubecfg
  name: default 
current-context: default
kind: Config
preferences: {}
users:
- name: kubecfg
  user:
    client-certificate-data: xxxxx(拥有菜单对应的namespace所有权限)
    client-key-data: xxxxx(拥有菜单对应的namespace所有权限)

Code Example – Main Controller (Go)

package main
import (
    "controllers/example/api/web/service"
    "flag"
    "os"
    examplev1 "controllers/example/api/v1"
    "controllers/example/controllers"
    "k8s.io/apimachinery/pkg/runtime"
    clientgoscheme "k8s.io/client-go/kubernetes/scheme"
    _ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/log/zap"
    // +kubebuilder:scaffold:imports
)

var (
    scheme   = runtime.NewScheme()
    setupLog = ctrl.Log.WithName("setup")
)

func init() {
    _ = clientgoscheme.AddToScheme(scheme)
    _ = examplev1.AddToScheme(scheme)
    // +kubebuilder:scaffold:scheme
}

func main() {
    var metricsAddr string
    var enableLeaderElection bool
    //设置启动参数
    flag.StringVar(&metricsAddr, "metrics-addr", ":8090", "The address the metric endpoint binds to.")
    flag.BoolVar(&enableLeaderElection, "enable-leader-election", false,
        "Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.")
    flag.Parse()

    //配置日志打印参数
    ctrl.SetLogger(zap.New(func(o *zap.Options) { o.Development = true }))
    //加入到controller manager管理
    mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{ Scheme: scheme, MetricsBindAddress: metricsAddr, LeaderElection: enableLeaderElection, Port: 9443 })
    if err != nil {
        setupLog.Error(err, "unable to start manager")
        os.Exit(1)
    }
    //不需要web能力可以删除此行
    go service.RunServer(mgr)
    //核心代码,注册CRD,与麒麟平台自定的资源建立watch机制
    if err = (&controllers.ExampleKindReconciler{ Client: mgr.GetClient(), Log: ctrl.Log.WithName("controllers").WithName("ExampleKind"), Scheme: mgr.GetScheme(), }).SetupWithManager(mgr); err != nil {
        setupLog.Error(err, "unable to create controller", "controller", "ExampleKind")
        os.Exit(1)
    }
    // +kubebuilder:scaffold:builder
    setupLog.Info("starting manager")
    if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
        setupLog.Error(err, "problem running manager")
        os.Exit(1)
    }
}

Code Example – Reconciler (Go)

package controllers
import (
    "context"
    "strconv"
    "github.com/go-logr/logr"
    "k8s.io/apimachinery/pkg/runtime"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    examplev1 "controllers/example/api/v1"
)

// ExampleKindReconciler reconciles a ExampleKind object
type ExampleKindReconciler struct {
    client.Client
    Log    logr.Logger
    Scheme *runtime.Scheme
}

var num = 0
// +kubebuilder:rbac:groups=example.sreplat.com,resources=examplekinds,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=example.sreplat.com,resources=examplekinds/status,verbs=get;update;patch
//当麒麟平台上执行一个运维过能时,controller就会watch参数,并携带参数信息进入到这个函数。
func (r *ExampleKindReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
    num += 1
    ctx := context.Background()
    _ = r.Log.WithValues("examplekind", req.NamespacedName)
    example := &examplev1.ExampleKind{}

    // your logic here
    //下面都是样例代码,用户直接实现自己的业务逻辑即可
    if err := r.Get(ctx, req.NamespacedName, example); err != nil {
        r.Log.V(1).Info("couldn't find module:" + req.String())
    } else {
        r.Log.V(1).Info("接收Moduler资源的变更", "Resource.spec", example.Spec)
        r.Log.V(1).Info("接收Moduler资源的变更", "Status", example.Status)
    }
    if example.Status.Event == "created" {
        example.Status.Event = "created_done"
        example.Spec.Ba += strconv.Itoa(num)
        r.Log.V(1).Info("创建业务结束了,资源的状态更新为done", "num:", num, "example.Status.Event", example.Status.Event)
        r.Update(ctx, example)
    }
    if example.Status.Event == "updated" {
        example.Status.Event = "updated_done"
        example.Spec.Ba += strconv.Itoa(num)
        r.Log.V(1).Info("更新业务结束了,资源的状态更新为done", "num:", num, "example.Status.Event", example.Status.Event)
        r.Update(ctx, example)
    }
    if example.Status.Event == "list" {
        example.Status.Event = "list_done"
        example.Spec.Ba += strconv.Itoa(num)
        r.Log.V(1).Info("查询业务结束了,资源的状态更新为done", "num:", num, "example.Status.Event", example.Status.Event)
        r.Update(ctx, example)
    }
    if example.Status.Event == "deleted" {
        example.Status.Event = "deleted_done"
        example.Spec.Ba += strconv.Itoa(num)
        r.Log.V(1).Info("删除业务结束了,资源的状态更新为done", "num:", num, "example.Status.Event", example.Status.Event)
        r.Update(ctx, example)
    }
    return ctrl.Result{}, nil
}

func (r *ExampleKindReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&examplev1.ExampleKind{}).
        Complete(r)
}

8. Case Study – ChubaoFS – Using the described approach, ChubaoFS implemented 43 atomic operations and 18 orchestration tasks, executing about 500 automated tasks per week.

9. Platform Capabilities – The Tai‑Shan Kirin platform offers operation functions, command execution, scheduled tasks, resource visualization, resource actions, and visual orchestration workflows, all built on Kubernetes CRDs to enable infrastructure‑as‑code.

Conclusion – By measuring automation rate and providing a self‑coding platform, operations teams can achieve cost savings, higher efficiency, improved stability, and professional growth, as demonstrated by the significant automation gains in the case study.

Kubernetesdevopsplatformoperations automationCRDAutomation Metrics
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.