Big Data 11 min read

Storm Component Relationships, Parallelism Calculation, and Nimbus Task Assignment

This article explains the relationships among Storm's core components, how parallelism is calculated from configuration parameters, and how Nimbus performs task assignment and load balancing across workers, supervisors, and executors in a distributed streaming topology.

Architecture Digest
Architecture Digest
Architecture Digest
Storm Component Relationships, Parallelism Calculation, and Nimbus Task Assignment

Component Relationships

In Storm, a worker is a process started by a supervisor that handles a single topology. Each worker launches one or more executor threads, which are the physical containers for tasks (1‑N relationship). A component abstracts spouts, bolts, and acker, while a task is the parallelized instance of a component (also 1‑N).

Supervisors periodically fetch topology assignments and heartbeat information from ZooKeeper, then start or stop workers to match the current assignment and maintain load balance. Workers update connection information to know which other workers they should communicate with.

Parallelism Calculation

The number of workers, executors, and tasks is determined by configuration parameters and parallelism‑hint , but the actual runtime parallelism may differ from the specified values.

Key parameters:

TOPOLOGY-WORKERS : number of workers for a topology.

parallelism‑hint : initial number of executors for a component.

TOPOLOGY-TASKS : number of tasks for a component; if unspecified, it equals the number of initial executors; otherwise it is the minimum of the specified value and TOPOLOGY-MAX‑TASK‑PARALLELISM .

For the special acker bolt, parallelism follows TOPOLOGY-ACKER‑EXECUTORS if set, otherwise it defaults to the number of workers.

NIMBUS-SLOTS-PER-TOPOLOGY and NIMBUS-EXECUTORS-PER-TOPOLOGY limit the total workers and executors respectively, causing an exception if exceeded.

Parallelism can be changed dynamically via Nimbus's rebalance or do‑rebalance operations.

defn- component-parallelism [storm-conf component] (let [storm-conf (merge storm-conf (component-conf component)) num-tasks (or (storm-conf TOPOLOGY-TASKS) (num-start-executors component)) max-parallelism (storm-conf TOPOLOGY-MAX-TASK-PARALLELISM)] (if max-parallelism (min max-parallelism num-tasks) num-tasks)))

Task Assignment in Nimbus

The assignment process consists of several steps, each implemented by Clojure functions:

defn- compute-topology->executors [nimbus storm-ids] "compute a topology-id -> executors map" (into {} (for [tid storm-ids] {tid (set (compute-executors nimbus tid))})))

defn- compute-executors [nimbus storm-id] (let [conf (:conf nimbus) storm-base (.storm-base (:storm-cluster-state nimbus) storm-id nil) component->executors (:component->executors storm-base) storm-conf (read-storm-conf conf storm-id) topology (read-storm-topology conf storm-id) task->component (storm-task-info topology storm-conf)] (->> (storm-task-info topology storm-conf) reverse-map (map-val sort) (join-maps component->executors) (map-val (partial apply partition-fixed)) (mapcat second) (map to-executor-id))))

defn storm-task-info "Returns map from task -> component id" [^StormTopology user-topology storm-conf] (->> (system-topology! storm-conf user-topology) all-components (map-val (comp #(get % TOPOLOGY-TASKS) component-conf)) (sort-by first) (mapcat (fn [[c num-tasks]] (repeat num-tasks c))) (map (fn [id comp] [id comp]) (iterate (comp int inc) (int 1))) (into {})))

The final assignment is performed by mk-assignments , which computes executor‑to‑node+port mappings and creates the Assignment structures that are stored in ZooKeeper.

mk-assignments ;; compute topology -> executor -> node+port ->compute-new-topology->executor->node+port ->compute-topology->executors ...

Load Balancing

Load balancing drives task assignment. Nimbus periodically gathers heartbeat data from supervisors, workers, and executors via ZooKeeper, combines it with current topology information, and re‑computes assignments. Rebalancing can be triggered manually (e.g., via the web UI) or automatically when the cluster state changes.

Different strategies such as round‑robin, resource‑slot based allocation, or topology isolation can be applied, but the article does not delve into specific algorithms.

Conclusion

The article supplements a previous discussion by detailing how Storm calculates parallelism, maps components to tasks, and performs Nimbus‑driven task assignment and load balancing.

distributed systemsload balancingParallelismApache StormNimbustask assignment
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.