Operations 13 min read

How to Seamlessly Integrate CloudWeGo with APMPlus for Full‑Stack Observability

This article explains the challenges of observability in distributed microservice and LLM architectures, introduces CloudWeGo and APMPlus, and provides step‑by‑step integration guides for Kitex, Hertz, and Eino frameworks, including code samples, data reporting methods, and advanced monitoring features such as RED metrics, LLM‑specific indicators, service topology, and future roadmap.

ByteDance Cloud Native

Apr 3, 2025

How to Seamlessly Integrate CloudWeGo with APMPlus for Full‑Stack Observability

Background

Distributed architectures and micro‑services improve scalability but create three observability challenges: data dispersion, complex traceability, and fault propagation. Traditional monolithic monitoring cannot link full call chains across services, especially in LLM applications, requiring developers to add custom instrumentation.

What is CloudWeGo

CloudWeGo is ByteDance’s open‑source enterprise‑grade cloud‑native microservice middleware suite, focusing on high performance, scalability, and reliability. It includes sub‑projects such as Kitex, Hertz, Netpoll, Volo, Thriftgo, Fastpb, Pilota, and many others.

What is APMPlus

APMPlus is Volcano Engine’s APM service offering full‑stack performance monitoring, custom tracing, and alerting. It provides:

Exception detection and alerts : quickly locate bottlenecks and failures.

Rich attribution : stack, scheduling, dimension, and custom point analysis.

Trace and log query : combine call‑chain and logs for rapid debugging.

Flexible reporting : trend analysis for system health.

APMPlus powers stability for products like Toutiao, Douyin, and Feishu, and is trusted across industries.

Observability Integration: CloudWeGo + APMPlus

APMPlus server monitoring deeply adapts to CloudWeGo frameworks (Kitex, Hertz, Eino), enabling automatic monitoring and trace collection.

Kitex Integration

Kitex is a high‑performance Go RPC framework supporting multiple protocols and service governance.

Step 1: Initialize Kitex

import (
    "github.com/kitex-contrib/obs-opentelemetry/provider"
    "github.com/kitex-contrib/obs-opentelemetry/tracing"
    ...
)
func main() {
    serviceName := "echo"
    p := provider.NewOpenTelemetryProvider(
        provider.WithServiceName(serviceName),
        provider.WithInsecure(),
    )
    defer p.Shutdown(context.Background())
    svr := echo.NewServer(
        new(EchoImpl),
        server.WithSuite(tracing.NewServerSuite()),
        server.WithServerBasicInfo(&rpcinfo.EndpointBasicInfo{ServiceName: serviceName}),
    )
    if err := svr.Run(); err != nil {
        klog.Fatalf("server stopped with error:", err)
    }
}

Step 2: Data Reporting – either report directly to APMPlus or forward via an OpenTelemetry collector.

Hertz Integration

Hertz is a Go HTTP framework supporting HTTP/1.1, HTTP/2, HTTP/3, and WebSocket.

import (
    "github.com/hertz-contrib/obs-opentelemetry/provider"
    hertztracing "github.com/hertz-contrib/obs-opentelemetry/tracing"
    ...
)
func main() {
    p := provider.NewOpenTelemetryProvider(provider.WithInsecure())
    defer p.Shutdown(context.Background())
    tracer, cfg := hertztracing.NewServerTracer()
    h := server.Default(tracer)
    h.Use(hertztracing.ServerMiddleware(cfg))
    h.Spin()
}

Eino Integration

Eino is a Go‑centric LLM application framework offering extensibility, reliability, and performance.

import (
    "github.com/cloudwego/eino-ext/callbacks/apmplus"
    "github.com/cloudwego/eino/callbacks"
    ...
)
func main() {
    cbh, shutdown, err := apmplus.NewApmplusHandler(&apmplus.Config{
        Host:        "apmplus-cn-beijing.volces.com:4317",
        AppKey:      "appkey-xxx",
        ServiceName: "eino-app",
        Release:     "release/v0.0.1",
    })
    if err != nil { log.Fatal(err) }
    callbacks.InitCallbackHandlers([]callbacks.Handler{cbh})
    // Wait for all traces and metrics to be sent before exit
    shutdown(context.Background())
}

Eino can be deployed locally (Docker, Redis) or on Volcano Engine (container service, model platform).

Application Observability

Through APMPlus, developers can view call chains (Traces), performance metrics, runtime status, RED golden metrics (QPS, error rate, latency), LLM‑specific metrics (call count, token usage, response latency, TTFT, TPOT), service topology, runtime Go metrics, and detailed trace analysis (flame graphs, call lists).

Future Outlook

APMPlus and CloudWeGo will deepen integration for micro‑services and LLM workloads, expanding framework support, reducing integration cost, and adding richer LLM‑specific observability (inference latency, resource usage, generation state) to help optimize model‑calling strategies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Microservices APM LLM Observability Go OpenTelemetry APMPlus CloudWeGo

Written by

ByteDance Cloud Native

Sharing ByteDance's cloud-native technologies, technical practices, and developer events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.