Operations 10 min read

Graceful Shutdown in Microservices: Concepts, Kubernetes Example, and Optimizations

This article explains the concept of graceful shutdown, outlines general steps, presents a detailed Kubernetes‑SpringBoot‑Nacos case study, discusses common pitfalls, and provides practical optimization techniques for reliable service termination in cloud‑native environments.

Architecture Digest

Mar 3, 2024

Graceful Shutdown in Microservices: Concepts, Kubernetes Example, and Optimizations

Graceful shutdown refers to the process of safely terminating a device, system, or application by executing a series of actions that protect data, prevent errors, and maintain overall stability.

Typical steps include backing up data, stopping new requests, processing in‑flight requests, notifying dependent components, and finally shutting down after all elements have exited safely.

The exact procedures vary across devices, systems, and scenarios, sometimes requiring user notifications or automated state preservation for later recovery.

In microservice environments, graceful shutdown becomes more complex due to the rise of Docker and Kubernetes. A concrete example demonstrates the Kubernetes pod deletion workflow, including network rule updates, endpoint removal, and container cleanup.

During pod deletion, Kubernetes performs actions such as updating the pod status to Extinating, removing the pod IP from endpoints, and updating iptables via kube‑proxy. The kubelet then cleans up resources, adds a PreStop hook, sends SIGTERM to the container, and after a default 30‑second grace period, may send SIGKILL if the container does not exit.

The article presents a real‑world case using k8s, SpringBoot, and Nacos, where the PreStop hook performs Nacos deregistration and a 35‑second sleep. Issues arise because SpringBoot’s default shutdown time is only 2 seconds, insufficient for completing asynchronous tasks, and the 35‑second sleep can exceed the pod’s terminationGracePeriodSeconds, causing a forced kill.

Optimization points include reducing the sleep after Nacos deregistration, adjusting terminationGracePeriodSeconds to be slightly larger than the combined PreStop duration and SpringBoot shutdown time, and enabling SpringBoot’s graceful shutdown with custom logic for MQ listeners, scheduled tasks, and thread pools.

Sample code shows how to subscribe to Nacos instance change events and manually refresh Ribbon’s service instance cache:

/**
 * Subscribe to Nacos instance change notifications
 * Manually refresh Ribbon service instance cache
 * Nacos client 1.4.6 (1.4.1 has critical bugs)
 */
@Component
@Slf4j
public class NacosInstancesChangeEventListener extends Subscriber<InstancesChangeEvent> {
    @Resource
    private SpringClientFactory springClientFactory;

    @PostConstruct
    public void registerToNotifyCenter() {
        NotifyCenter.registerSubscriber(this);
    }

    @Override
    public void onEvent(InstancesChangeEvent event) {
        String service = event.getServiceName();
        String ribbonService = service.substring(service.indexOf("@@") + 2);
        log.info("#### Received Nacos instance change event:{} ribbonServiceName: {}", event.getServiceName(), ribbonService);
        ILoadBalancer loadBalancer = springClientFactory.getLoadBalancer(ribbonService);
        if (loadBalancer != null) {
            ((ZoneAwareLoadBalancer<?>) loadBalancer).updateListOfServers();
            log.info("Refresh Ribbon service instance cache: {} success", ribbonService);
        }
    }

    @Override
    public Class<? extends com.alibaba.nacos.common.notify.Event> subscribeType() {
        return InstancesChangeEvent.class;
    }

    /**
     * Nacos 1.4.4 ~ 1.4.6 requires this method; later versions fix the issue.
     * When multiple registries exist, events are not isolated, so implement this to decide if the event should be handled.
     */
    @Override
    public boolean scopeMatches(InstancesChangeEvent event) {
        return true;
    }
}

Further optimization suggests setting terminationGracePeriodSeconds to roughly 10 seconds + 30 seconds (SpringBoot’s default buffer) and configuring the thread pool to wait for tasks on shutdown:

threadPoolTaskExecutor.setWaitForTasksToCompleteOnShutdown(true);
threadPoolTaskExecutor.setAwaitTerminationSeconds(30);

Additional considerations include handling MQ listeners and scheduled tasks by listening to Nacos deregistration events within the shutting‑down service itself, and ensuring gateway services refresh Ribbon caches to stop traffic to the terminating instance.

The article concludes with a summary of the comprehensive graceful shutdown solution, emphasizing the need to address business‑level logic such as long‑running requests, task persistence, and idempotent APIs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native Operations Nacos Graceful Shutdown

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.