Cloud Native 17 min read

Deep Dive into the Kubernetes Job Controller Implementation

This article provides a detailed walkthrough of the Kubernetes Job controller source code, explaining the flow from kube‑controller‑manager initialization through key functions such as NewJobController, Run, worker, syncJob, and manageJob, and illustrates how informers, workqueues, and expectations coordinate job lifecycle management.

Cloud Native Technology Community

Dec 18, 2019

Deep Dive into the Kubernetes Job Controller Implementation

This guide explores the inner workings of the Kubernetes Job controller by tracing the execution path from kube-controller-manager startup to the Job controller's core logic. It assumes familiarity with the Cobra CLI library and basic Kubernetes hands‑on experience.

Key entry point

<span>func main() {</span>
<span>    command := app.NewControllerManagerCommand()</span>
<span>    command.Execute()</span>
<span>}</span>

The NewControllerManagerCommand constructs the Cobra command and loads controller options:

<span>func NewControllerManagerCommand() *cobra.Command{</span>
<span>    s, err := options.NewKubeControllerManagerOptions()</span>
<span>    cmd := &cobra.Command{</span>
<span>        Use: "kube-controller-manager",</span>
<span>        Run: func() {</span>
<span>            c, err := s.Config(KnownControllers(), ControllersDisabledByDefault.List())</span>
<span>            Run(c.Complete(), wait.NeverStop)</span>
<span>        },</span>
<span>    }</span>
<span>    return cmd</span>
<span>}</span>
<span>func KnownControllers() []string {</span>
<span>    ret := sets.StringKeySet(NewControllerInitializers(IncludeCloudLoops))</span>
<span>    ret.Insert(saTokenControllerName)</span>
<span>    return ret.List()</span>
<span>}</span>
<span>func NewControllerInitializers(loopMode ControllerLoopMode) map[string]InitFunc {</span>
<span>    controllers["cronjob"] = startCronJobController</span>
<span>    controllers["job"] = startJobController</span>
<span>    controllers["deployment"] = startDeploymentController</span>
<span>    ...</span>
<span>}</span>

The Job controller is started in startJobController:

<span>func startJobController(ctx ControllerContext) (http.Handler, bool, error) {</span>
<span>    go job.NewJobController(</span>
<span>        ctx.InformerFactory.Core().V1().Pods(),</span>
<span>        ctx.InformerFactory.Batch().V1().Jobs(),</span>
<span>        ctx.ClientBuilder.ClientOrDie("job-controller"),</span>
<span>    ).Run(int(ctx.ComponentConfig.JobController.ConcurrentJobSyncs), ctx.Stop)</span>
<span>}</span>

The core JobController struct holds clients, informers, workqueues and event recorders:

<span>type JobController struct {</span>
<span>    kubeClient clientset.Interface</span>
<span>    podControl controller.PodControlInterface</span>
<span>    updateHandler func(job *batch.Job) error</span>
<span>    syncHandler func(jobKey string) (bool, error)</span>
<span>    podStoreSynced cache.InformerSynced</span>
<span>    jobStoreSynced cache.InformerSynced</span>
<span>    expectations controller.ControllerExpectationsInterface</span>
<span>    jobLister batchv1listers.JobLister</span>
<span>    podStore corelisters.PodLister</span>
<span>    queue workqueue.RateLimitingInterface</span>
<span>    recorder record.EventRecorder</span>
<span>}</span>

Construction of the controller registers event handlers for Jobs and Pods, sets up the workqueue, and wires the sync and update handlers:

<span>func NewJobController(podInformer coreinformers.PodInformer, jobInformer batchinformers.JobInformer, kubeClient clientset.Interface) *JobController {</span>
<span>    jm := &JobController{</span>
<span>        kubeClient: kubeClient,</span>
<span>        podControl: controller.RealPodControl{KubeClient: kubeClient, Recorder: eventBroadcaster.NewRecorder(scheme.Scheme, v1.EventSource{Component: "job-controller"})},</span>
<span>        expectations: controller.NewControllerExpectations(),</span>
<span>        queue: workqueue.NewNamedRateLimitingQueue(workqueue.NewItemExponentialFailureRateLimiter(DefaultJobBackOff, MaxJobBackOff), "job"),</span>
<span>    }</span>
<span>    jobInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{AddFunc: func(obj interface{}) { jm.enqueueController(obj, true) }, UpdateFunc: jm.updateJob, DeleteFunc: func(obj interface{}) { jm.enqueueController(obj, true) }})</span>
<span>    jm.jobLister = jobInformer.Lister()</span>
<span>    jm.jobStoreSynced = jobInformer.Informer().HasSynced</span>
<span>    podInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{AddFunc: jm.addPod, UpdateFunc: jm.updatePod, DeleteFunc: jm.deletePod})</span>
<span>    jm.podStore = podInformer.Lister()</span>
<span>    jm.podStoreSynced = podInformer.Informer().HasSynced</span>
<span>    jm.updateHandler = jm.updateJobStatus</span>
<span>    jm.syncHandler = jm.syncJob</span>
<span>    return jm</span>
<span>}</span>

The Run method starts a configurable number of workers after ensuring caches are synced:

<span>func (jm *JobController) Run(workers int, stopCh <-chan struct{}) {</span>
<span>    if !controller.WaitForCacheSync("job", stopCh, jm.podStoreSynced, jm.jobStoreSynced) { return }</span>
<span>    for i := 0; i < workers; i++ { go wait.Until(jm.worker, time.Second, stopCh) }</span>
<span>}</span>

Each worker repeatedly calls processNextWorkItem, which pulls a job key from the queue and invokes syncJob:

<span>func (jm *JobController) worker() { for jm.processNextWorkItem() { } }</span>
<span>func (jm *JobController) processNextWorkItem() bool {</span>
<span>    key, quit := jm.queue.Get()</span>
<span>    forget, err := jm.syncHandler(key.(string))</span>
<span>    if err == nil && forget { jm.queue.Forget(key) }</span>
<span>    if err != nil { jm.queue.AddRateLimited(key) }</span>
<span>    return true</span>
<span>}</span>

syncJob

performs the main reconciliation: it fetches the Job object, checks completion, calculates retries, and decides whether to call manageJob or delete pods on failure. It also updates the Job status and emits events.

The manageJob function balances the number of active Pods against the Job's parallelism and completions settings, creating or deleting Pods as needed, while respecting expectations and handling errors with exponential back‑off.

Overall, the article demystifies how the Kubernetes Job controller leverages informers to watch resources, a rate‑limiting workqueue to serialize work, and a set of expectations to coordinate creates and deletes, ensuring the desired state of Jobs is eventually achieved.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CloudNative Kubernetes Go Controller informer Workqueue JobController

Written by

Cloud Native Technology Community

The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.