Big Data 20 min read

Comprehensive Guide to Hadoop MapReduce Job Execution, Scheduling, and Optimization

This article provides an in‑depth explanation of Hadoop MapReduce architecture, covering the roles of JobClient, JobTracker, TaskTracker and HDFS, the complete job lifecycle from submission to completion, scheduling strategies, shuffle and sort mechanisms, fault tolerance, and performance tuning techniques.

Architect
Architect
Architect
Comprehensive Guide to Hadoop MapReduce Job Execution, Scheduling, and Optimization

MapReduce jobs can be launched with a single line of code such as JobClient.runJob(conf) . Four main entities participate in a job: JobClient (configures and submits the job), JobTracker (initializes, schedules and coordinates the job), TaskTracker (executes the split map or reduce tasks), and HDFS (stores input data, configuration files and job results).

The overall execution flow is: code writing → job configuration → job submission → map‑task allocation and execution → intermediate result processing → reduce‑task allocation and execution → final output.

During submission, JobClient obtains a new JobId from the JobTracker, validates paths, computes input splits, copies required resources (jars, configs) to shared HDFS, and finally calls JobTracker.submitJob() to start the job.

JobTracker creates a JobInProgress object, reads split information from HDFS, and creates the appropriate number of map and reduce TaskInProgress objects. These tasks are placed in caches awaiting assignment.

TaskTrackers send periodic heartbeats to JobTracker, indicating their liveness and requesting new tasks. JobTracker assigns map or reduce tasks, preferring data‑local placement when possible; otherwise it may use rack‑local or remote data.

When a TaskTracker receives a task, it localizes the job’s jar, split files, and configuration, creates a working directory, and launches a TaskRunner (either MapTaskRunner or ReduceTaskRunner ) in a separate JVM to isolate failures.

Map task execution steps include: allocating parameters, writing task info to a child process, configuring logs, reading input splits via a RecordReader , creating a MapRunnable , invoking the user’s map function, and collecting output into a MapOutputBuffer .

Streaming and Pipes allow custom executables to communicate with Hadoop via standard I/O or sockets.

Both jobs and individual tasks maintain status information (progress, counters, messages). Progress for map tasks is the fraction of input processed; reduce progress is divided into copy, sort, and reduce phases (each roughly one‑third of the work).

Job completion occurs when JobTracker receives notifications that the last task has finished; it then marks the job successful, and JobClient reports the final counters.

Failure handling includes task attempts (re‑execution on another TaskTracker), task time‑outs, JVM crashes, and blacklisting of unreliable TaskTrackers. JobTracker itself is a single point of failure in older Hadoop versions.

Scheduling algorithms covered are FIFO (priority‑based but non‑preemptive), Fair Scheduler (shares cluster slots fairly among users and supports preemption), and Capacity Scheduler (queues with allocated capacities, each queue using FIFO internally).

The shuffle phase moves map output to reducers, performing partitioning, sorting, and optional combiner processing. Map output is first written to an in‑memory buffer, spilled to disk when thresholds are reached, and later merged into a single sorted file per partition.

Reducer copy phase fetches map outputs (from memory or disk) via HTTP, followed by a merge phase that maintains key order, and finally the reduce function processes the sorted data and writes results back to HDFS.

Configuration and tuning advice includes: minimizing memory usage in map/reduce code, adjusting JVM heap size ( mapred.child.java.opts ), increasing the map output buffer size ( io.sort.mb ), enabling map output compression, enlarging Hadoop’s I/O buffer, and tuning shuffle‑related parameters ( mapred.inmem.threadhold , mapred.job.shuffle.input.buffer.percent , etc.).

Speculative execution launches duplicate tasks for slow-running attempts to reduce overall job latency; it is enabled by default but can be disabled per job or cluster.

JVM reuse ( mapred.job.reuse.jvm.num.tasks ) allows multiple tasks to share a single JVM, reducing startup overhead and enabling state sharing between tasks.

Skipping bad data can be controlled via the skipnode setting.

big dataSchedulingMapReduceHadoopTaskTrackerJobTracker
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.